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1.  Introduction 

During  the  period  of  this  contract,  we  have  been  able  to 
demonstrate  very  high  speed  operation  of  several  different 
superconducting  signal  processing  circuits.  The  circuits  include 
two  different  kinds  of  A/D  converters,  flash  type  for  the  highest 
speeds  and  a  delta- sigma  circuit  for  high  resolution.  Out  of  the 
work  on  a  flash-type  A/D  converter  grew  a  new  logic  family  which 
has  potential  for  very  fast  operation.  As  a  part  of  the  work  on 
the  flash  A/D  converter,  a  scheme  was  proposed  for  using  CMOS 
circuits  built  into  the  substrate  to  calibrate  the  input  stage  of 
the  A/D  converter  in  order  to  increase  its  dynamic  range  .  We  have 
designed  a  tightly  specified  serial -to-parallel  decoder  which  has 
been  shown  to  operate  correctly  with  2  Gbit/s  input  data.  Also, 
additional  evaluation  of  a  current- steering  shift  register  was 
done  and  a  flux- shuttle  shift  register  was  designed  and  tested. 

The  project  has  included  architecture  and  software  studies. 

We  have  devoted  some  of  our  effort  to  issues  of  appropriate  signal 
processing  architectures  for  different  logic  families  and  have 
evaluated  some  of  the  emerging  logic  families.  We  have  developed 
additional  computer- aided- design  tools  to  add  to  our  earlier  work 
that  produced  the  tools  JSPICE  and  JSIM  circuit  simulators,  which 
are  widely  used  in  this  field.  The  new  tools  include  a  program 
which  allows  the  determination  of  the  dc  superconducting  state  of 
a  circuit  and  a  program  for  extraction  of  inductances  in 
superconducting  integrated  circuits. 

Our  niobium  integrated  circuit  process  has  been  developed 
further  under  this  contract  and  we  have  been  able  to  demonstrate 
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junction  fabrication  with  small  spreads  of  critical  currents  and 
successful  fabrication  of  signal  processing  circuits.  We  have  also 
introduced  innovations  in  insulator  formation  for  niobium  circuit 
processing. 

2.  Flash-Type  Analog- to-Digltal  Converter 

During  the  period  of  this  contract  we  extended  the  evaluation 
of  a  idea  developed  during  the  preceding  period  of  Air  Force 
support  for  a  fully  parallel  flash- type  A/D  converter.  The  input 
conparator  circuit  consists  of  a  one-junction  SQUID,  the  inductor 
of  which  is  the  control  line  of  a  two- junction  SQUID  latching 
output  stage.  The  initial  idea  and  initial  demonstration  in  both 
simulation  and  experiment  were  done  by  E.S.  Fang  and  appeared  in 
publications  and  in  his  Ph.D.  dissertation  [1,2].  Simulations 
predict  that  a  4 -bit  A/D  converter  with  this  comparator  circuit 
could  be  sanpled  at  20  gigasanples/s  and  have  an  analog  bandwidth 
of  10  GHz. 

This  work  is  being  continued  by  another  student,  H.  Luong, 
who  has  modified  the  design  and  optimized  the  parameters  to 
increase  the  circuit  margins.  The  present  configuration  of  the 
comparator  circuit  is  shown  in  Fig.  1.  [3]  A  two-phase  clock  is 
used,  with  the  first  phase  applied  to  the  one- junction  saitpling 
SQUID  and  the  second  phase  applied  to  the  latching  read-out  two- 
junction  SQUID.  In  order  to  achieve  the  desired  bandwidth- 
resolution  product,  it  is  necessary  to  have  a  very  short  aperture 
time.  In  this  design  the  small  aperture  time  is  achieved  by  adding 
a  sharp  pulse  to  the  bias  and  signal  applied  to  the  input  one- 
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Fig. 


CLKl 


1  Circuit  diagram  for  the  comparator  of  the  flash- type  A/D 
converter . 
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jxanction  SQUID.  The  pulse  is  derived  from  the  phase-one  clock 
which  also  is  a  bias.  The  junction  Jg  was  added  to  improve  margins. 
The  clock  junction  in  the  output  stage  acts  as  a  regulator.  The 
correct  operation  of  the  comparator  stage  has  been  verified  both 
at  low  speed  and  with  inputs  up  to  3  GHz. 

Fang  proposed  using  a  modification  of  the  ccmparator  circuit 
as  a  logic  gate  for  the  encoder  to  take  the  thermometer  code  from 
the  output  of  the  converter  stage  and  convert  it  to  binary.  An 
advantage  of  this  kind  of  logic  is  that  inverters  are  easily 
realized.  Luong  subsequently  revised  and  simplified  the  design  of 
the  encoder.  [3]  A  3 -bit  encoder  has  been  fabricated  and  shown  to 
function  correctly  at  2  Gbits/s.  Further  work  will  combine  the 
comparator  and  encoder  as  in  Fig.  2  and  will  extend  the  size  of 
the  complete  converter  to  four  bits.  Figure  3  shov/s  the  truth 
table  and  the  results  of  the  low- speed  demonstration  of  the  3 -bit 
encoder.  The  follow-on  work  is  being  conducted  with  support  of  the 
multi-agency  University  Research  Initiative. 

We  evaluated  the  possibility  of  using  Fang's  conparator/logic 
gate  in  a  flux-transfer  configuration  in  which  the  junctions  are 
nonhysteretic  and  hence  do  not  latch  into  a  voltage  state.  [4]  A 
shift  register  with  +20%  margins  was  simulated  at  50  GHz.  Other 
circuits  simulated  include  a  buffer,  an  XOR  gate,  an  OR/AND  gate, 
an  inverting  gate,  and  a  fan-out  gate.  Because  the  junctions  used 
were  nonhysteretic,  this  logic  family  is  potentially  useful  for 
high-teirperature  superconductors,  for  which  only  nonhysteretic 
junctions  may  be  available. 
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Fig.  2.  Implementation  of  a  conplete  three-bit  A/D  converter. 
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Fig.  3  Truth  table  for  three-bit  eu^''der  and  low- speed 
measurements  for  (a)  the  first  four  input  patterns  shown  in  the 
truth  table  auid  (b)  the  last  four  input  patterns  shown  in  the 
truth  table.  From  top  to  bottom  are  the  three  clocks,  the  three 
inputs,  cuad  the  three  outputs,  respectively. 
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A  further  aspect  of  the  A/D  converter  that  was  proposed  by 
Fang  is  the  use  of  CMOS  circuits  to  do  a  self -calibration  of  the 
con^rator  stage.  [1]  Because  of  variations  of  parameters  inherent 
in  the  fabrication  process,  it  is  difficult  to  extend  beyond  four 
bits  of  resolution.  With  five  bits,  for  example,  the  steps  in  the 
digital  staircase  of  references  for  31  comparators  each  are  about 
3%  of  full  scale.  Variations,  say,  of  critical  currents  of  the 
junctions  would  be  greater  than  3%  so  the  staircase  would  not  be 
monotonic.  The  solution  is  to  adjust  the  biases  on  the  coirparators 
to  make  up  for  the  variations  of  the  circuit  parameters.  A  CMOS 
circuit  was  proposed  that  would  measure  the  switching  point  for 
each  conparator  and  adjust  the  bias  to  achieve  the  ideal  value. 
Each  coitparator  would  be  adjusted  in  turn  upon  initial  excitation 
of  the  circuit.  The  CMOS  circuit  turns  itself  off  when  the 
calibration  is  conpleted  and  the  A/D  conversion  involves  only  the 
Josephson  conponents.  Another  PhD  student  is  following  up  this 
suggestion  with  the  support  of  the  University  Research  Initiative. 

3.  Delta- Sigma  Analog- to-Dlgltal  Converter 

Some  applications  for  A/D  converters  require  high  resolution 
and  efforts  have  been  directed  toward  the  use  of  the  high  speed  of 
Josephson  electronics  for  this  purpose.  Several  projects  in 
superconductor  technology  have  employed  the  counting  architecture 
in  which  a  series  of  pulses  are  generated  by  an  input  SQUID  as  the 
analog  signal  varies  and  the  nxomber  of  flux  quanta  in  the  SQUID 
change.  A  series  of  counting  SQUIDs  follow  and  count  the  pulses 
over  a  given  interval.  The  result  is  a  binary'’  representation  of 
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the  average  signal  amplitude  during  that  timing  interval.  P.H. 
Xiao  in  our  group  has  taken  a  different  approach  to  try  to  avoid 
some  of  the  problems  of  the  counting  A/D  converter;  we  are  seeking 
to  emulate  the  high- resolution  delta- sigma  A/D  converter  popular 
in  semiconductor  technology,  for  which  the  block  diagram  is  shown 
in  Fig.  4. 

A  key  conponent  of  the  delta- sigma  circuit  is  an  integrator, 
which  requires  an  amplifier.  No  suitable  aitplifier  exists  in 
superconductive  technology,  but  a  low-pass  filter  was  recognized 
as  having  the  same  frequency  characteristics.  The  lack  of  an 
cirrplifier  can  be  made  up  for  by  having  a  very  sensitive  comparator 
to  use  in  the  one-bit  quantizer  which  follows  the  filter.  Two 
kinds  of  simulations  have  been  done  to  evaluate  the  performance. 
The  result  for  an  oversaitpling  ratio  of  128  is  a  signal -to- noise 
ratio  of  70  dB,  which  corresponds  to  11  bits  of  resolution.  Xiao 
has  devised  a  modulator  stage  (Fig.  5)  which  accepts  the  analog 
input  signal  and  a  clock  and  provides  at  its  output  a  density- 
modulated  train  of  single  bits  at  1  Gbit/s.  [5]  The  functioning  of 
the  components  of  the  modulator  has  been  verified  at  l  Gbit/s; 
subsequent  testing  will  show  operation  of  the  entire  modulator. 

To  complete  the  conversion,  the  modulator  is  followed  by  a 
decimation  filter  which  suppresses  the  out-of-band  high-frequency 
quantization  noise,  prevents  the  aliasing  of  the  out-of-band 
signal  into  the  passband,  maintains  the  passband  ripple  to  within 
specifications,  and  down-sainples  the  output  signal.  These 
functions  can  be  accomplished  by  a  cascade  of  a  linear  phase  sine 
FIR  filter  and  an  IIR  low-pass  filter.  Both  of  these  filters  can 


Fig.  4  Structure  of  a  semiconductor  delta- sigma  converter. 


Input  Transformer  Feedback  Transformer  Low-pass  Filter 


Huffle  D/A  Converter  Readout  SQUID 


Fig.  5  Modulator  for  a  superconducting  delta- sigma  A/D  converter 
employing  a  low -pass  filter. 


be  realized  in  superconductor  technology  but  only  the  FIR  filter 
is  being  planned  for  now.  since  the  output  of  the  FIR  filter  is  at 
a  low  data  rate,  external  test  equipment  will  be  used  to  replace 
the  HR  filter's  function.  This  continuation  of  the  project  will 
be  supported  by  the  University  Research  Initiative. 

4.  Shift  Register 

The  previous  Air  Force  contract  supported  a  study  of  a  dc- 
powered  shift  register  in  which  current  steering  between  the  legs 
of  a  superconducting  loop  was  used  to  represent  "0"s  and  "l"s.  [6] 
Early  work  during  this  contract  period  continued  the  evaluation. 

[7]  It  became  clear  that,  although  the  dc  powering  is  an 
advantage,  there  were  some  serious  drawbacks,  including  size  and 
coirqplexity.  We  decided  to  look  further  at  an  early  suggestion  for 
a  shift  register,  the  flux  shuttle,  which  coitprises  a  parallel 
connection  of  two- junction  SQUIDs  with  three-phase  powering  to 
shuttle  flux  quanta  (representing  stored  bits)  along  with  the 
clock.  [8]  This  shift  register  has  power  dissipation  only  when 
transferring  data  (except  for  the  current  sources)  and  is 
COTr5)atible  with  our  A/D  converter  design,  which  will  allow  future 
combination  of  the  two  into  a  signal -acquisition  subsystem.  In  the 
structure  chosen  (Fig.  6),  the  three-phase  power  is  directly 
coupled  rather  than  magnetically  coupled  because  the  former  has 
larger  margins.  This  type  of  circuit  uses  nonhysteretic  junctions 
and  is  therefore  compatible  with  high -temperature  superconductors. 

The  structure  was  thoroughly  analyzed  and  much  emphasis  was 
devoted  to  the  method  of  readout.  For  some  applications,  such  as 


Data  write 


Single  bit,  three  cells 


Fig.  6  Basic  form  of  the  direct -inject ion  flux-shuttle  shift 
register,  not  showing  reading  circuits. 


demultiplexing  or  in  filters,  correlators,  or  convolvers,  it  is 
necessary  to  read  at  every  stage  so  the  effect  of  the  readout  on 
margins  and  speed  is  irt5)ortant.  Several  different  readout 
circuits  were  studied  and  evaluated  experimentally.  Shift 
registers  of  various  lengths  were  fabricated  and  tested.  The  test 
data  and  margins  for  a  6-bit  version  are  shown  in  Fig.  7  for 
clocking  at  1  GHz.  The  limitation  of  speed  to  1  GHz  was  due  to  the 
test  facilities.  Work  on  the  shift  register  to  extend  its  length, 
increase  testing  speed,  and  to  combine  it  with  an  A/D  converter 
will  be  carried  out  under  the  follow-on  University  Research 
Initiative. 

5.  Bit -Serial  Decoder 

One  of  the  proposed  applications  of  superconductive 
electronics  is  a  crossbar  switch  which  would  interconnect  128 
soniconductor  processors  with  an  equal  number  of  memories.  When  a 
processor  would  attempt  to  send  data  to  memory,  it  would  send  a  2 
Gbit/s  train  of  bits  containing  the  address  and  the  data.  In  order 
to  make  the  desired  connection,  decoders  are  needed.  The  chosen 
architecture  employs  a  set  of  four  decoders,  each  capable  of 
selecting  one  of  32  lines  with  a  5 -bit  address,  so  the  set  of  four 
decoders  can  choose  one  of  128  lines.  The  entire  address  is  in 
the  first  seven  bits  received  from  the  processor.  There  are  two 
parts  of  the  decoder.  The  first  is  to  convert  the  serial  address 
bits  into  parallel  and  the  second  is  to  take  five  binary  coded 
bits  and  use  than  to  chose  one  of  32  lines.  This  project  by  D.  A. 
Feld  involved  mainly  the  second  part  but  some  work  was  also  done 


Margins  of  the  6-Bit  Shift  Register  with 
Series  Junction  Read-Out  at  1  GHz 

Parameter 

nom 

margin 

(dB) 

margin 

(%) 

I* 

350  ^lA 

N/A 

+29/ -31 

V 

’'Wriu 

300  mV 

N/A 

+53/ >-17 

^iUadBUi 

41  mV 

N/A 

N/A 

138  mV 

+/-2 

+26/ -21 

112  mV 

+/-  2.5 

+34/ -25 

«»3 

127  mV 

+/-  2.5 

+34 /-25 

Fig.  7  Oscilloscope  photograph  showing  test  data  and  table  of 
margins  for  6 -bit  long  flux- shuttle  shift  register  with  operation 
at  1  GHz.  Top  trace  is  readout  of  first  bit  and  lower  trace  is 
that  of  the  fourth  bit. 


on  the  first  and  we  did  the  final  demonstration  of  thecombination. 

The  enphasis  in  Feld's  work  was  to  invent  circuits  that  can 
meet  a  tight  set  of  constraints  for  the  parallel  part  of  the 
decoder  and  to  demonstrate  that  one  of  32  lines  can  be  chosen  at  a 
2  Gbit/s  rate.  [9,10]  For  system  considerations  such  as  the  need 
to  limit  total  crossbar  current  and  power  and  to  fit  the  entire 
crossbar  switch  on  a  1-cm  chip,  the  parallel  decoder  current  was 
limited  to  6  mA  and  the  power  to  250  pW.  The  size  of  the  circuit 
was  required  to  be  no  more  than  0.4  mm  by  1.5  mm.  It  was  also 
required  that  the  margins  be  large,  in  order  to  meet  the 
limitation  on  current  and  still  keep  the  gate  currents  high  enough 
to  avoid  noise  switching,  it  was  necessary  to  devise  multi- input 
logic  circuits.  The  basic  logic  structure  is  shown  in  Fig.  8;  it 
is  seen  that  the  inputs  are  the  five  bits  designated  A-E.  The 
entire  parallel  decoder  contains  32  of  these  units  to  decode  the 
32  combinations  of  A-E,  and  is  shown  in  Fig.  9.  All  of  the 
specifications  were  met. 

We  worked  with  Hypres,  Inc.  to  eliminate  some  problems  from 
the  serial-to-parallel  converter  and  to  combine  the  two  parts  of 
the  decoder.  The  final  structure  is  shown  in  Fig.  10;  it  contains 
144  junctions.  Feld  performed  tests  of  the  decoder  and  showed  that 
it  functioned  completely  and  correctly  with  data  input  of  2 
Gbits/s. 

6.  Architecture  Study 

Very  few  workers  in  Josephson  digital  circuit  technology  are 
knowledgeable  in  computer  and  signal  processing  architecture,  we 


Fig.  8  Basic  Icjgic  stiructure  of  the  parallel  decoder  comprises  a 
three- input  NOR  gate  feeding  a  two -input  multi -output  NOR  gate. 
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took  advantage  of  expertise  (J.  Fleischman)  within  our  group  to  do 
a  study  of  architectures  appropriate  to  the  various  logic 
families.  [8,11]  Some  of  the  general  issues  addressed  included 
statistics  on  the  use  of  various  kinds  of  instructions  in  general 
purpose  computers,  speed  improvement  by  use  of  cache  memories, 
pipelining  and  some  of  the  attendant  problems  such  as  the 
formation  of  "bubbles"  of  inactivity  in  the  pipelines, 
parallelism,  and  synchronous  vs.  asynchronous  operation. 

There  are  three  main  categories  of  superconductive  logic  gate 
as  defined  by  the  type  of  clocking,  and  these  define  special 
issues  in  the  architecture.  (1)  The  quantum  flux  parametron  (QFP) 
and  the  new  Fang  (Sec.  2)  logic  are  fully  synchronous  at  the  gate 
level.  One  logic  operation  is  done  on  each  clock  cycle  which  must 
be  a  high  frequency  for  high-speed  logic.  The  system  is  pipelined 
at  gate  level.  The  deep  pipeline  and  high  clock  speed  make  full 
utilization  difficult  in  most  digital  systems.  (2)  Most  voltage- 
state  logic  allows  ripple -through  on  one  clock  phase,  that  is, 
locally  asynchronous  operation,  with  the  data  being  picked  up  in 
another  set  of  gates  during  the  next  phase  of  clock  or 
alternatively,  held  in  a  latch  during  a  transition  of  a  single 
phase  clock.  The  ripple -through  logic  capability  makes  possible 
low  latency  but  does  not  provide  high  throughput.  (3)  The  rapid 
single  flux  quantiam  (RSFQ)  logic  can  be  fully  asynchronous,  with 
timing  set  by  internally  generated  pulses.  Asynchronous  circuits 
inplemented  in  RSFQ  logic  potentially  have  very  high  throughput.  A 
conplete  computing  system  using  RSFQ  would  only  require 
synchronization  at  block  level. 


General  purpose  conputing  is  unlikely  in  the  near  future  for 
systems  composed  entirely  of  superconductive  components  due  to  RAM 
and  cache  requirements,  though  these  may  be  alleviated  by  the 
proposed  hybrid  Josephson-CMOS  memory  structures. 

In  digital  signal  processing,  random- access  memory  can  be 
replaced  by  more  dedicated  memory  structures.  In  addition,  a  data 
flow  architecture  with  high  levels  of  pipelining  can  be  selected 
to  maximize  throughput.  With  these  choices,  digital  signal 
processing  is  the  most  likely  prospect  for  implementation  with 
superconductive  digital  electronics. 

7.  Computer -Aided -Design  (CAD)  Tools 

The  nonlinear  behavior  of  Josephson  circuits  demands  computer 
tools  for  their  simulation;  furthermore,  useful  digital  circuits 
such  as  filters,  multipliers,  etc.  involve  large  numbers  of  logic 
gates  (typically  over  1000)  and  cannot  be  optimally  designed  by 
hand.  In  work  under  earlier  Air  Force  support,  we  developed 
simulation  tools  that  can  be  used  to  evaluate  the  dynamic 
performance  of  circuits  involving  small  numbers  of  gates.  We  first 
modified  the  SPICE  simulator  to  include  a  model  for  the  Josephson 
junction  and  called  the  modified  program  JSPICE.  In  Air  Force 
sponsored  work  just  preceding  the  contract  period  of  this  report, 
we  devised  a  simulator  with  a  computational  algorithm  specialized 
to  Josephson  peculiarities,  called  JSIM.  It  can  do  simulations 
about  an  order  of  magnitude  faster  than  JSPICE  and  is  widely  used. 

Two  new  programs  were  developed  under  this  contract;  one 
provides  a  way  of  finding  operating  points  and  dc  transfer 


characteristic  curves  of  "^osephson  circuits  in  the  superconducting 
state  [12]  and  the  other  is  for  extraction  of  inductance 
parameters  from  a  circuit  layout  [13] . 

For  the  program  to  find  operating  points,  E.  S.  Fang  used  a 
mixed-mode  method;  this  combines  source  stepping  and  time- domain 
calculations.  Josephson  circuit  equations  are  often  multivalued  , 
which  irtplies  the  existence  of  multiple  solutions.  When  the  paths 
taken  by  the  independent  sources  are  specified,  only  one  of  the 
many  possible  solutions  can  be  physical.  The  mixed  mode  algorithm 
follows  the  paths  of  the  independent  sources,  detects  ill- 
conditioned  points,  and  converges  to  stable  points  on  the 
characteristic  curves  of  the  simulated  circuit.  The  algorithm  was 
iitplemented  and  case  studies  were  done.  The  method  and  techniques 
are  suitable  for  implementing  in  a  general  circuit  simulator. 

In  the  inductance  extraction  program  (INDEX)  by  P.  H.  Xiao, 
inductances  are  calculated  on  the  basis  of  two-dimensional 
modeling  of  sections  of  the  layout.  The  inductances  are  modeled 
by  simple  analytic  expressions  to  keep  the  computation  time  within 
acceptable  limits.  INDEX  is  designed  to  work  with  the  MAGIC  layout 
system.  MAGIC  has  interfaces  with  intermediate  layout  formats  such 
as  cif  and  calma  and  has  a  comer- stitch  data  structure  that  makes 
the  extraction  simple.  In  MAGIC,  polygons  are  represented  by 
rectangles  called  tiles.  Each  tile  has  four  pointers  to  its  four 
neighbors,  which  irakes  neighbor- related  operations  easy  to 
inplement.  A  two- junction  SQUID  and  its  extracted  representation 
is  shown  in  Fig.  11.  The  main  aim  of  the  circuit  extraction  is  to 
find  and  evaluate  the  parasitic  inductances.  Several  improvements 
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are  under  consideration.  Many  tiles  are  too  short  in  the  current 
flow  direction  for  two-dimensional  modeling  to  be  accurate,  so  we 
will  consider  ways  of  in?)lementing  three-dimensional  modeling. 
Another  important  help  for  the  designer  would  be  an  automatic 
generation  of  the  schematic  including  all  parasitic  components 
rather  than  the  presently  used  netlist. 

The  more  general  problem  of  designing  large  circuits  and 
their  layouts  starting  from  logic  descriptions,  as  is  done  for 
semiconductor  circuits,  will  be  done  in  follow-on  work  under  the 
University  Research  Iniative. 

8.  Niobium  Integrated- Circuit  Process 

D.  F.  Hebert  has  developed  a  process  in  our  Microfabrication 
Facility  for  fabricating  niobixam  superconductive  integrated 
circuits  with  good  parameter  control.  The  process  is  capable  of 
producing  excellent  quality  Nb/AlOx/Nb  Josephson  junctions  as 
small  as  1.6  1*6  |mi  with  critical  current  densities  as  high 

as  3600  A/ cm2,  some  examples  of  I-V  characteristics  are  shown  in 
Fig.  12.  The  process  features  molybdenum  resistors  in  which  sheet 
resistance  is  controlled  to  within  a  few  percent  of  design  value 
at  cryogenic  temperature  by  use  of  an  in- situ  resistance 
measurment  during  deposition. 

The  innovative  use  of  VLSI  quality  oxides  is  being 
incorporated  to  make  possible  high-density  circuits.  We  have 
replaced  the  previously  used  SiO  by  Si02  and  pioneered  the  use  of 
PECVD  oxide  for  layers  of  insulator  below  the  junctions. 
Insulators  above  the  junctions  should  be  deposited  at  a  lower 
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Fig.  12  Josephson  junctions  fabricated  in  the  UCB  Microfabri¬ 
cation  Laboratory. 


temperature  than  PECVD  since  the  Nb/AlOx/Nb  junctions  are  known  to 
degrade  at  temperatures  above  150  °C.  We  have  inplemented  a  low 
tenperature  (90  °C)  reactively  sputtered  silicon  process  to  form 
Si02  for  this  purpose.  We  are  evaluating  LPCVD  oxide  which  is  used 
in  semiconductor  technology  to  form  excellent  insulating  layers. 

In  the  follow-on  work  under  the  University  Research  Initiative, 
the  development  of  iitproved  insulators  will  continue,  as  will 
other  developments  of  the  niobium  process. 

REFERENCES 

1.  E.  S.  Fang,  D.  Hebert,  and  T.  Van  Duzer,  "A  multi -gigahertz 
Josephson  flash  A/D  converter  with  a  pipelined  encoder  using 
large -dynamic -range  current - latch  comparators,"  IEEE  Trans.  Magn., 
Vol.  27,  pp.  2891-2894,  March  1991. 

2.  E.  S.  Fang,  "A  Josephson  flash-type  analog- to- digital  converter 
and  related  topics  in  superconductive  circuits,"  Ph.D. 
Dissertation,  University  of  California,  Berkeley,  1991. 

3.  H.  Luong,  D.  Hebert,  and  T.  Van  Duzer,  "Fully  parallel 
superconducting  analog- to- digital  converter,"  IEEE  Trans.  Appl. 
Superconductivity,  Vol.  3,  pp.  2633-2636,  March  1993. 

4.  E.  Anderson,  "A  new  type  of  superconducting  flux- transfer 
logic,"  Report  for  M.S.  degree.  Department  of  Electrical 
Engineering  and  Computer  Sciences,  University  of  California, 
Berkeley,  1991. 

5.  P.  H.  Xiao,  "Superconducting  delta- sigma  oversampling  A/D 
converter,"  IEEE  Trans.  Appl.  Superconductivity,  Vol.  3,  pp.  2625- 
2628,  March  1993. 


6.  V.  Nandakumar,  "Design,  fabrication,  and  testing  of  a  Josephson 
shift  register,"  Ph.D.  Dissertation,  University  of  California, 
Berkeley,  1990. 

7.  j.  Pleischman,  "Miniaturization  of  a  superconducting  flux-mode 
shift  register,"  Report  for  M.S.  degree.  Department  of  Electrical 
Engineering  and  Conputer  Sciences,  University  of  California, 
Berkeley,  May  1990. 

8.  J.  Pleischman,  "A  flux-shuttle  shift  register  and  computer 
architecture  for  superconductive  digital  systems,"  Ph.D. 
Dissertation,  University  of  California,  Berkeley,  1993. 

9.  D.  A.  Peld,  D,  P.  H^ert,  and  T.  Van  Duzer,  "A  5-32  bit  decoder 
for  application  in  a  crossbar  switch,"  .IEEE  Trans.  Appl. 
Superconductivity,  Vol.  3,  pp.  2671-2674,  March  1993. 

10.  D.  A.  Peld,  "  A  Josephson  bit-serial  decoder  for  application 
in  a  crossbar  switch,"  Ph.D.  Dissertation,  University  of 
California,  Berkeley,  1993. 

11.  J.  Pleischman  and  T.  Van  Duzer,  "Computer  architecture  issues 
in  superconductor  microprocessors,"  IEEE  Trans.  Appl. 
Superconductivity,  Vol.  3,  pp.  2616-2619,  March  1993. 

12.  E.  S.  Pang  and  T.  Vem  Duzer,  "An  efficient  method  for  finding 
dc  solutions  for  Josephson  circuits,"  IEEE  Trans.  Appl. 
Superconductivity,  Vol.  l,  pp.  126-133,  September  1991. 

13.  P.  H.  Xiao,  E.  Charbon,  A.  Sangiovanni-Vincentelli,  T.  Van 
Duzer, and  S.W.  Whiteley,  "INDEX:  An  inductance  extractor  for 
superconducting  circuits,"  IEEE  Trans.  Appl.  Superconductivity, 
Vol.  3,  pp.  2629-2632,  March  1993. 


APPENDIX 


PUBLISHED  PAPERS 


CEE  lUANSACnONS  ON  MAGNETICS.  VCX.  27.  NO.  2.  MARCH  1991 


2891 


A  MULTI-GIGAHERTZ  JOSEPHSON  FLASH  A/D  CONVERTER 
WITH  A  PIPELINED  ENCODER 

USING  LARGE-DYNAMIC-RANGE  CURRENT-LATCH  COMPARATORS 

Emerson  S.  Ang.  David  Hebert  and  Theodore  Van  Duzer 

Depanment  of  Electrical  Engineering  and  Computer  Science 
University  of  California  at  Berkeley 
Berkeley.  C^ifomia  94120 


Abstract 

We  present  the  design  of  a  muld-gigahettz  4-bit  A/D  con¬ 
vener  with  a  pipelined  encoder.  A  wideband  and  large  dynamic 
range  comparator  serves  as  basic  building  block  for  both  the 
quantizer  and  the  encoder,  which  simplifies  the  design.  We  wilt 
show  the  design  of  the  comparator  and  the  building  of  the  quan¬ 
tizer  and  the  encoder  with  the  comparator  circuits.  Simulation 
and  initial  'r"  ro<>>lt<  are  presented,  and  the  possibility  of  adapt¬ 
ing  the  design  to  high-T,  circuit  is  riso  discussed. 


Introduction 


A  single-stage  flash-type  Josephson  A/D  convener  consists 
of  a  quantizer  and  an  encoder  as  shown  in  Fig.  1.  The  quantizer 
is  a  string  of  2"-l  comparators  in  parallel  for  an  n-bit  convener. 
It  generates  a  thermometer  code  of  the  analog  input.  The 
encoder  err-r-r  the  thermometer  code  to  a  binary  output.  The 
speed  of  an  A/D  convener  is  expressed  by  two  factors,  clock  or 
conversion  rate  and  bandwidth.  The  bandwidth  of  an  A/D  con¬ 


vener  is  defi.ned  as  the  maximum  frequency  of  a  sinusoidal  ana¬ 
log  signal  it  can  conven  without  aliasing.  The  maximum  clock 
rate  is  determined  by  the  switching  speed  of  the  comparators  in 
the  quantizer  and  the  logic  gates  in  the  encoder.  In  the  absence 
of  a  sample-and-hold,  the  maximum  bandwidth  is  detetmined  by 
the  apenure  time  of  the  comparators  in  the  quantizer,  where  the 
apenure  time  t_  i  l/(2"itfB)  (1,2].  The  reader  can  refer  to  refer¬ 
ence  1  and  2  Tor  a  more  detailed  discussion  of  speed-limiting 
factors  in  flash-type  Josephson  A/D  conveners. 


Rg.  I.  An  n-bit  single-stage  flash-type  Josephson  A/D  convener. 


The  Basic  Current-Latch  Comparator 
A  schematic  diagram  of  the  current-latch  comparator  is 
shown  in  Fig.  2.  It  is  an  improved  version  of  the  circuit  we 
rraorted  earlier  |3].  Figures  3  and  4  are  the  characteristic  curve 
m  the  one-junction  SQUID  Sj  and  the  threshold  curve  of  the 
symmetric  two-junction  SQUID  S3,  respectively.  The  one- 
junction  SQUID  S|  is  a  pulser  that  generates  a  positive  pulse  on 
the  rising  edge  of  clock  1  and  a  negadve  pulse  on  the  falling 

This  woifc  was  sponsered  by  AF  Contract  I9628-86-K-0033  and  Fi9628-90- 
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Rg.  2.  The  cunent-laich  comparator. 

Li  •  337|tA.  L,  «  6pH.  R-,  -  0.7D. 

Li  •  337ttA.  Lj  -  IpH,  L 1  >  6pH.  ILi  -  0.70, 

La  •  U  «  ISOuA.  L,  -  3pH.  R«  -  2.40. 
fc-0.6S.R,>SnR,-6O.RL>  >0O- 

edge.  The  one-junction  SQUID  Sj  (comprising  Jj,  L?,  and  L'j) 
is  the  sampling  SQUID  operating  in  a  cutrent-laiching  mode. 
Figure  3  illustrates  the  basic  operation  of  the  current  latch.  The 
dc  bias  current  I^u  establishes  the  threshold  input  current  for  the 
latch.  If  !„  +  I^  is  above  the  threshold  upon  arrival  of  Ip.  the 
operating  point  of  sampling  SQUID  Sy  will  jump  one  step  on  the 


TIME 


Fig.  3.  Operation  of  the  cuirem-latch  SQUID  .'m. 
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tairctK  characteristic.  This  makes  the  threshold  cunent  of  the 
read  SQUID  S3  less  than  the  critical  current of  J3,  u  shown 
in  Fig.  4.  On  the  rising  edge  of  clock  2,  SQUID  S3  will  switch 
before  J3  and  prevent  Jj  going  into  the  voltage  state.  The 
opposite  will  happen  if  I,)  is  less  dian  the  threshold  current  Tlw 
Mwanw  UOlW  b)  the  negative  peak  of  l  -■ 


R|.  4.  Threshold  cinre  of  the  read  SQUID  S>  In*  and  I-r  ■*  Ac  high  and 
low  cuncm  level  in  L  3  respectively. 

There  are  several  advantages  of  this  design.  First  the  aper¬ 
ture  time  of  the  comparator  is  determined  by  the  pulse  width  of 
the  pulser  SQUID,  which  can  be  a  few  picosecond  or  less.  This 
means  an  achievable  bandwidth  in  the  multi-gigaheiu  range  for  a 
4-bit  convener.  Second,  the  symmetric  read  SQUID  S3  provides 
isolation  for  the  sampling  SQUID  Sj  during  the  apenure  time. 
Any  signal  or  noise  fed  back  from  the  output  will  split  equally  to 
the  two  branches  of  S.«,  and  the  couplings  back  to  the  sampling 
ovfuiiJ  02  cancel  each  other,  which  makes  the  comparator  uni¬ 
directional,  to  first  order.  This  improves  the  sensitivitv  of  the 
comparator.  Finally,  the  low  impedance  of  the  input  node  makes 
biasing  and  superposition  of  signals  relatively  easy.  And  it  is 
also  a  desirable  load  for  an  analog  signal  current  source,  to 
achieve  of  minimum  signal  attenuation,  as  can  be  vaified  by 
using  the  Norton  equivalent  for  the  signal  source. 

Dynamic  Range  of  (he  Comparator 

In  our  previous  design  (3],  the  positive  pulse  L  is  used  for 
sampting,  and  the  negative  pulse  1^  is  used  for  resening.  Reset¬ 
ting  will  occur  only  if  Ip  -f  Ip  is  greater  than  -  Ij,.  and  the 
range  of  the  input  signal  I^  is  less  than  Ip  +  Ip  - 1^^.  The 
current  pulse  amplitudes  Ip  and  1.  are  limited  since  they  are  gen¬ 
erated  by  the  one-junction  SQUID  Sj.  The  dynamic  range  of  the 
input  signal  is  therefore  limited  by  the  pulse  amplitudes.  On  the 
other  hand,  for  a  n-bit  single-s'age  flash  A/D  converter,  the  com¬ 
parators  need  a  dynamic  range  of  2"1ub<  where  I^b 
cunent  corresponding  to  one  least  significant  bit  In  our  previ¬ 
ous  design,  the  comparator  has  sufficient  dynamic  range  for  a  4- 
bit  convener  if  I^b  less  than  10  uA,  and  the  junction  critical 
current  density  is  above  2S00  A/cm  .  A  larger  dynamic  range  is 
desirable.  With  the  modulating  signal,  which  can  be  sinusoidal 
and  is  at  the  clock  frequency,  the  dynamic  range  of  the  analog 
signal  is  then  limited  by  Cto/fLy  +  Ly)  and  (he  amplitude  of  the 
m^ulating  signal.  This  tn^es  a  many-fold  improvement  in  tiie 
dynamic  range  over  the  previous  design.  The  a^tiond  require¬ 
ment  is  that  the  positive  pulse  should  arrive  near  the  peak  of  the 
modulating  signal,  if  a  sinusoid  is  used.  The  reouired  timing  of 
the  pulse  can  be  achieved  with  a  delay  line,  and  by  adjusting  the 
bias  to  (he  one-junction  SQUID  pulser  S|. 

The  Three-Phase  Pipelined  Encoder 

In  a  pipelined  encoder,  the  encoding  function  is  done  in  a 
series  of  stages.  If  we  have  an  encoder  for  a  2-bit  converter,  a 

3- bit  encoder  can  easily  be  constructed  in  two  stages  using  two 
2-bit  encoders  and  some  additional  logic  u  shown  in  Fig.  3.  A 

4- bit  encoder  can  be  built  from  two  3-bit  encoden  and  an  addi¬ 
tional  stage  of  logic  functions.  The  extension  is  identical  to  that 
shown  in  Fig.  5. 

The  circuit  shown  in  Fig.  6  is  very  similar  to  the  comptn- 
tor,  except  that  the  positions  of  junction  J3  and  one-junction 
SQUID  S3  are  interchanged;  hence,  the  output  is  inverted.  The 
biu  current  can  be  adjusted  to  give  two  kinds  of  threshold 
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Fig.  5.  A  pipeliiied  S-hil  mcoder. 


current  1^—,..  If  we  adjust  Ijp*),  to  that  1^  or  I,  <  1^-^,  < 
Ia  ig'  *  NAND  gate  will  result,  where  Ia  and  Ib  are  the  logic 
’T  current  level  for  A  and  B  inputs,  reqrectively.  On  the  other 
hand,  we  can  adjust  1^  to  be  less  than  either  1.  or  Ib'.  then  a 
NOR  gate  results.  Changing  J5  and  S3  back  to  the  potion  in 
Fig.  2,  we  can  form  AND  a^  OR  gates.  The  basic  logic  gates 
that  can  be  implemented  with  the  comparator  design  are  NAND, 
NOR,  AND  and  OR. 


The  block  diagrm  of  the  2-bit  encoder  along  with  clock 
phases  is  shown  in  Fig.  7.  All  clock  signals  are  assumed  to  be 
sinusoidal  and  at  the  same  frequency.  Figure  7  also  shows  the 
gate-level  implementation  of  the  2-bit  encoder.  To  build  a  3-bit 
encoder  as  shown  in  Fig.  5,  we  also  need  2-input  multiplexers. 
Rgure  8  shows  a  design  for  the  multiplexer.  A  4-bit  encoder 
will  have  four  2-bit  encoders,  seven  2-input  multiplexers  and 
various  buffers  and  inverten,  and  a  pipe  latency  of  S  1^  clock 
cycles. 
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Design  and  Simulation 

The  Khematic  of  a  complete  design  of  the  comparator  is 
shown  in  Fig.  2.  dock  1  and  dock  2  an  sinusoidal  and  dock 
2  has  a  phase  lag  of  12(F.  Junction  1,^  is  a  wave-shaping  junc¬ 
tion.  It  IS  not  essential  for  correct  operation  of  the  circuit;  how¬ 
ever,  inclusion  of  the  wave-shaping  junction  improves  ^  toler¬ 
ance  on  phase-lag  error  between  dock  I  and  dock  2.  The  1,1 
product  **"’nlinf  SQUID  Sj  should  be  about  Oo  ^ 
best  operating  margin.  The  l(^  inductance  of  the  read  SQUID 
$3  should  be  about  half  of  Lj  in  SQUID  S3  to  get  maximum 
sensitivity.  The  design  in  Fig.  2  is  for  a  junction  critical  current 
density  of  fiOO  A/cm  .  The  clock  rate  for  this  circuit  can  reach  2 
GHz  in  simulation.  At  this  speed,  the  delay  line  is  not  needed. 
The  adjustment  in  the  bias  current  Ip^  for  the  SQUID  pulser  S|  is 
sufficient  to  give  an  effective  pulse  neight,  as  defined  in  Fig.  3, 
of  rttorr  than  20  tlA. 

As  was  pointed  out  at  the  beginning,  the  speed  of  an  A/D 
convener  is  expressed  by  clock  rale  and  bandwidth.  Bandwidth 
is  defined  as  the  maximum  frequency  of  a  sinusoidal  signal  that 
the  MD  convener  can  conven  without  aliasing.  The  b^width 
is  limited  by  the  sampling  theorem  to  be  one  half  of  the  clock 
rate,  but  the  actual  bandwidth  can  be  even  lower  due  to 
aonidealities  in  the  circuit.  The  limit  on  clock  rate  for  the  com¬ 
parator  is  attributed  to  the  punchihrough  effect,  and  the  limit  on 
bandwidth  is  the  result  of  the  finite  pulse  width  from  the  pulser. 
At  low  junction  current  density,  the  punchihrough  effect  dom¬ 
inates,  and  the  pulser  is  the  limiting  ractor  at  high  cuirent  den¬ 
sity.  The  crossing  point  is  at  a  junction  current  density  of  about 
2000  A/cm* 

To  determine  the  input  range  of  the  comparator,  we  apply  a 
dc  analog  signal.  It  can  be  seen  from  Fig.  3  that  if  the  input 
signal  is  greater  than  I|ow-Ibui>  ihe  current  latch  Si  will  not 
reset  For  this  design,  the  analog  current  value  at  which  reset 
cannot  occur  is  200  pA.  In  the  A/D  converter  application,  the 
comparator  can  take  an  input  peak-io-peak  sinusoid  up  to  400 
pA.  This  is  because  the  sampling  theorem  limits  the  bandwidth 
to  half  of  the  sampling  rate  and  between  sample  and  reset,  which 
is  half  of  the  clock  period,  the  input  sinusoid  at  the  band-limiting 
fiequency  cannot  slew  more  than  half  of  its  full  range.  One 
point  should  be  noted:  when  the  input  current  is  above  340  fiA, 
the  san^ling  SQUID  $3  will  jump  two  steps  on  its  characieruiic 
curve.  It  is  not  a  problem  as  long  as  the  operating  point  jumps 
down  at  least  one  step  during  reset,  which  will  happen  as  long  as 
the  analog  current  is  less  than  SOO  |iA.  If  for  any  reason,  the 
input  current  range  must  exceed  400  pA,  the  A/D  converter  will 
still  operate  correctly  if  the  signal  bandwidth  is  limited  to  sub¬ 
stantially  below  the  Nyquist  rate,  or  a  current  limiter  siiiular  to 
the  one  prt^sed  by  mersen  (4]  is  used  in  front  of  the  least 
significant  comparator  in  the  quantizer. 

The  RMS  noise  current  at  the  input  node  is  estimated  firom 
white  noise  analysis  to  be  less  than  3  pA.  This  gives  a  compara¬ 
tor  dynamic  range  of  133  or  42  dB,  corresponding  to  6  biu. 
None  of  the  present  processes  can  achieve  a  junction  critical 
current  uniformity  better  than  one'  percent,  which  would  be 
needed  to  allow  a  quantization  step  of  3  pA.  For  the  present 
design,  a  quantization  step  of  20  pA  is  used. 

Simulations  of  the  basic  circuits  are  performed  with  JSIM 
(S,fi].  Hgure  9  shovvs  the  simulation  result  of  a  comparator 
ciMked  at  2  GHz  with  a  1  GHz  sinusoidal  input.  At  the  time  of 
the  fint  sampling  pulse,  the  input  exceeds  the  comparator  thres¬ 
hold  current  of  %  pA.  During  the  second  sampling  pulse,  the 
input  is  less  than  20  pA.  The  simulation  result  indicates  correct 
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Fig.  9.  (a)  Input  and  pulse  lignait  k>  the  eompaiaior,  (b)  limulaikn  lewh  of 
output  cuntal  in  the  10  Q  resistive  load. 


Fig.  10.  Simulation  icauh  of  ou^ut  voluge  acres  the  10  O  resiitive  load  of 
an  Exclusive-OR  pie  conespon^g  10  inputt  of  *00*,  *01',  *10*  and  *11*. 
The  logic  *1*  input  cuirent  is  tt  40  pA. 


operation  of  the  comparator.  Figure  10  shows  the  simulation 
result  for  an  Exclusive-OR  gate  constructed  from  3  NAND  gates, 
which  were  discussed  previously.  From  the  simulation  result,  we 
can  verify  that  the  logic  function  performed  by  the  circuit  is  an 
Exclusive-OR. 


Test  Results 

Initial  low-sp^  test  runs  were  done  to  verify  the  func¬ 
tionality  of  the  design  and  to  extract  design  parameters.  A  pro¬ 
cess  run  without  resistois  was  made.  In  this  run,  die  basic  com¬ 
parator  was  laid  out  without  junction  J3,  die  pulser  SQUID  Sj  or 
the  wavesbaping  junction  Jt|.  The  layout  is  shown  in  Fig.  11. 
Damping  for  the  sampling  SQUID  was  provided  externally.  The 
clock  signal  to  the  read  SQUID  is  at  800  Hz,  and  the  input  to 
the  sampling  SQUID  is  at  100  Hz.  The  output  is  shown  in  Fig. 
12.  In  each  photo,  the  bottom  waveform  is  the  input  signal  and 
the  top  wavefoim  is  the  voluge  output  of  the  read  SQUID. 
Ground  level  shift  is  visible  in  the  photos,  this  does  not  affect 
the  circuit  since  it  is  due  to  finite  resistance  in  the  sample  holder 
leads.  The  result  of  the  test  verifies  correct  operation  of  the 
comparator. 

Scaling  to  Higher  Junction  Current  Density 
_  For  a  process  with  higher  junction  cuirent  density  the  ' 
design  essentially  remains  unchanged,  except  for  the  damping 
resistors.  The  damping  resistance  for  a  SQUuj  is  proportional  to 
iCC,  where  L  is  the  loop  inductance  and  C,  the  junction  capaci¬ 
tance.  The  junction  capacitance  is  inversely  proportional  to  I,; 
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Fig.  II.  Layout  of  the  umpling  SQUID  and  the  lead  SQUID . 


Rg.  12.  (a)  Wavefonn  of  a  logic  *1*  inpul  (botlom)  and  the  cone^nnding 
companlor  output  (top),  (b)  wavefoim  of  a  logic  'O'  input  (bouoin)  and  Uie 
cofTCsponding  output  (lop). 

thereforcMhe  damping  resistance  should  be  increased  in  propev- 
tion  to  for  a  given  L.  Figure  13  shows  the  same  simulation 
as  Fig.  9.  except  with  J,  at  2400  A/cm^;  the  clock  rate  is  8  GHz 
and  the  input  is  at  4  GHz.  At  this  junedon  current  density,  the 
punchthrough  and  the  finite  pulse  width  have  about  equal  contti* 
budon  to  speed  limiL  Below  this  current  density,  the  bandwidth 
of  the  A/D  convener  is  1/2  of  the  clock  rate.  For  hi^er  current 
density,  the  bandwidth  is  detennined  by  the  pulse  width  of  the 
pulser. 

Adapting  to  High-T,  Process 

It  is  expected  that  junedons  made  with  the  higher  tempera¬ 
ture  oxide  superconductors  for  somedme  into  the  future  wul  be 
nonhysteredc.  This  can  pose  a  significant  problem  for  circuits 
that  require  latching  operation  of  the  junedons.  The  ctirrent- 
latching  operation  of  the  comparator  does  not  require  a  hysieredc 
junedon;  however,  the  readout  circuit  requires  some  nwdificadon. 
To  keep  the  same  design  configutadon  as  in  Fig.  2,  extra  capaci¬ 
tance  can  be  added  to  junction  and  SQUID  S3  to  make  them 
hysteretic  because  the  functions  of  junedon  J3  and  SQUID  S3  in 
Fig.  2  cannot  be  realized  with  nonhysteredc  junctions. 

Adding  a  lot  of  capacitance  to  make  nonhysteredc  junedon 
hysteretic  can  significantly  slow  down  an  A/D  converter.  A 
modification  that  requires  junedons  with  much  less  hysteresis  can 


Rg.  13.  (a)  Input  and  pulse  signals  10  the  comparator,  (b)  simulation  le 
output  cuiicot  in  the  10  n  icststive  load. 


be  achieved  removing  J3  in  Figure  2.  Extra  shunt  capaci 
may  still  be  needed  in  S3  to  provide  enough  current-drive  1 
bility.  The  required  current  drive  is  less  than  100  pA  to  a 
resisdve  load.  Since  the  junction  J5  in  Fig.  2  is  no  Ic 
present,  the  amplitude  of  Qock  2,  which  now  serves 
clocked  bias,  must  be  controlled  very  precisely.  There  i 
altemadve  to  the  clocked  bias.  The  junedons  in  SQUID  S 
not  hysteredc  provided  if  they  are  not  shunted  with  large  ca 
tance;  then  they  are  self-resetting,  and  Dock  2  can  be  chang 
a  dc  bias. 

Conciiision 

We  have  shown  the  design  of  a  large-dynamic-range  1 
pmtor  and  the  design  of  a  complete  4-bit  A/D  converter  w 
pipelined  encoder  using  the  basic  comparator  circuit  as  a  b 
ing  block.  Simulation  results  show  muld-gigahettz  operatio: 
the  A/D  converter.  The  initial  low-speed  test  results  have  si 
correct  funcdonality  of  the  comparator.  The  current- 
characterisdes  of  the  comparator  allows  adaptation  of  the  ci 
to  high-Tj  superconductors  with  some  modifications. 
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Abstract—  This  paper  presents  measurements  that  follow 
up  cn  Fang's  design  of  a  three-bit  wideband  analog-to-digital 
converter  reported  earlier  [1],  The  original  design  has  been 
modified,  and  some  circuit  parameters  have  been  changed  to 
optimize  the  maigins.  Based  on  this  modified  design,  we  have 
fabricated  and  been  able  to  demonstrate  the  functionality  not 
only  of  simple  logic  gates,  including  inverters.  AND.  OR. 

NOR.  and  XOR,  but  also  of  much  more  complicated 
combinations,  including  a  complete  two-bit  analog-to-digital 
converter  and  a  complete  three-bit  binary  encoder.  After  a  brief 
description  of  the  design  and  modifications,  low-speed  tests  of 
these  circuits  will  be  presented  and  discussed. 

I.  INTRODUCTION 

In  Josephson  technology,  the  periodic  threshold 
characteristics  of  two-junction  SQUIDs  allow  a  unique  way  to 
implement  an  N-bit  flash-type  analog-to-digitai  converter 
(^C)  with  only  N  comparators  [2,  3. 4],  However,  this  type 
of  converter  suffers  from  limited  bandwidth  due  to  the 
dynamics  of  SQUID  loops.  To  achieve  wider  bandwidth,  the 
conventional  flash-type  architecture,  in  which '  2^-1 
cmnparators  are  used,  has  been  attempted  [1,  5].  Fang  has 
reported  his  design  of  such  a  ctmverter,  which  employs  a 
wideband  and  large-dynamic-range  current-latch  comparator 
as  the  building  block  for  both  the  quantizer  and  the  binary 
encoder  [1],  Following  up  on  Ids  design,  we  have  made  some 
design  modifications  and  have  changed  some  circuit 
parameters  to  maximize  the  circuit  maigins.  A  two-bit  ADC 
and  a  three-bit  binary  encoder  based  on  the  modified  design 
have  been  fabricated,  and  their  functionalities  have  been 
successfully  verified.  In  this  paper,  we  will  review  the  design, 
describe  the  modificatitms.  and  present  the  experimental 
results. 

tt  CIRCUir  DESCRIPnON  AND  PERFORMANCE 


to  convert  the  output  of  the  quantizer  from  a  thermometer  code 
to  a  useful  binary  representation.  In  order  to  achieve  an  N-bit 
resolution,  we  use  a  bank  of  2''-l  identical  comparators  to 
realize  the  quantizer  and  pipelined  logic  gates  to  form  the 
encoder.  A  unique  and  advantageous  feature  of  this  design  is 
that  the  same  comparator  circuit  used  for  the  quantizer  can 
readily  be  reconfigured  to  implement  all  the  logic  gates  needed 
for  the  binary  encoder. 


Comparator  Shown  on  Fig.  1  is  the  schematic  diagram  of 
the  comparator  building  block.  A  hysieretic  one-junction 
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Fig.  1  Circuit  diagram  for  the  comparator,  junction 
Jg  is  included  in  the  modified  version. 


SQUID  (composed  of  J| .  Lj .  and  Li)  is  used  as  a  comparator  to 
sample  the  analog  input.  A  two-junction  SQUID  (J2  and  J3)  in 
series  with  a  single  junction  (J4)  functions  not  only  as  a  readout 
device  but  also  as  a  buffer  isolating  the  output  from  the  input. 
To  minimize  the  aperture  time  and  to  widen  the  bandwidth  of 
the  comparator,  another  one-junction  SQUID  (Jp.  Lp,  and  Rp) 
acting  as  a  pulser  is  coimected  to  the  mput.  Finally,  to  reduce 
the  sensitivity  of  the  circuit  to  the  amplitude  of  the  second 
clock  CLK2,  the  readout  SQUID  is  biased  by  a  clock  junction 
(J5).  As  will  be  discussed  later,  junction  Jg  is  includ^  in  the 
modified  version  to  increase  the  circuit  maigins. 


Design  Overview 

As  in  the  conventional  flash-type  architecture,  this  design 
requires  a  quantizer  to  sample  and  assign  each  sampled  analog 
to  one  of  the  possible  output  levels  and  a  binary  encoder 
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The  bias  current  lyaj,  together  with  the  critical  current  of 
the  junction  Jj  and  the  pulser  output  Ip,  sets  flie  threshold  level 
for  the  comparator.  If  the  net  input  current  is  less  than  this 
threshold  level,  no  cunent  is  transferred  to  the  inductors  Lj  and 
L2.  When  the  second-phase  clock  CLK2  rises,  junction  J4, 
which  has  smaller  critical  current  than  that  of  the  two-junction 
SQUID,  switches  to  the  voltage  state  first  and  thus  prevents  the 
two-junction  SQUID  from  switching.  As  a  result,  the  output  is 
low.  On  the  other  hand,  if  the  net  input  current  is  larger  than  the 
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threshold  level,  current  is  transferred  to  Lj  and  L2.  As  the 
control  current  for  the  two-junction  SQUID,  it  reduces  the 
SQUID  critical  cuncnt  below  that  of  die  single  junction  J4. 
Consequently,  when  the  clock  CLK2  rises,  the  SQUID 
switches  to  the  voltage  state,  and  the  output  goes  high. 

Logic  Gates:  The  comparator  configuration  described 
above  can  be  used  to  implement  logic  gates  as  well.  Replacing 
l„f  with  another  input  current,  and  setting  the  bias  so  that  the 
threshold  level  is  larger  than  either  input  but  smaller  than  their 
sum,  an  AND  gate  is  obtained.  Likewise,  if  the  bias  is  adjusted 
so  that  the  threshold  level  is  smaller  than  either  input,  an  OR 
gate  is  achieved.  Inversion  functions.  Including  inverters. 
NAND,  and  NOR.  can  be  easily  obtained  by  exchanging  the 
positions  of  the  single  junction  15  and  the  two-juncdon 
SQUID. 

B.  Design  Modifications 

Comparator:  Even  though  the  readout  circuit  is  very 
insensitive  to  the  clock  bias  CLK2.  the  comparator  designed  by 
Fang  [1]  suffers  from  small  mar^s.  especially  in  critical 
current  of  the  junctions.  The  main  reason  is  that  the  threshold 
level  of  the  comparator  is  directly  dependent  on  the  critical 
current  of  the  junction  and  the  clock  amplitude  CLKl.  Any 
variation  in  either  of  these  can  reduce  the  margins  significantly. 

To  improve  the  margins,  another  single  junction  is 
added  in  series  with  the  sampling  junction  J,.  as  can  be  seen  in 
Fig.  1.  Effectively,  this  addition  creates  a  “race”  between  the 
two  junctions  J|  and  just  like  that  between  the  single 
junction  J4  and  the  readout  SQUID.  As  long  as  the  bias  is  in  an 
appropriate  range,  one  and  only  one  junction,  whichever  has  its 
critical  current  exceeded  first,  will  switch  to  voltage  state. 

The  modified  version  of  the  comparator  was  simulated 
extensively  with  JSIM  [6]  and  the  circuit  parameters  were 
changed  to  maximize  the  margins .  The  final  circuit  parameters, 
with  which  a  margin  of  ±  37%  for  the  junction  critical  cirrrent 
has  been  achieved,  are  listed  in  Table  1.  The  original 
parameters  are  also  included  for  purpose  of  comparison. 

Encoder:  Fang  suggested  using  different  combinations  of 
NAND,  NOR,  and  OR  gates  to  implement  two-bit  encoders, 
and  then  using  these  two-bit  encoders  together  with  MUXes  to 
construct  a  three-bit  binary  encoder  for  the  converter  [1]. 
However,  since  the  input  to  the  encoder  is  in  thermometer 
code,  the  design  can  be  much  simplified.  Taking  advantage  of 
the  unique  and  special  pattern  of  such  a  thermometer-coded 
iiqrut,  we  have  modified  the  architecture  and  have  been  able  to 
implement  a  complete  three-bit  binary  encoder  using  only 
buffers  and  three-input  XOR  gates. 

C.  Circuit  Implementation 

Fig.  2  shows  the  gate-level  implementation  of  a  two-bit 
encoder,  which  basically  consists  of  a  two-stage  buffer  and  a 
three-input  XOR  gate.  In  Fig.  3  is  the  block  diagram  of  a 
complete  three-bit  analog-to-digital  converter,  including  a 


Table  1 :  Design  parameters  for  the  comparator 


Parameter 

Origmal 

Modified 

Ic(Ji) 

337  PA 

367  pA 

Icdz) 

150  pA 

163  PA 

lc(h) 

150  pA 

163  PA 

IC(J4) 

225  pA 

245  PA 

Hh) 

N/A 

870  pA 

Icih) 

N/A 

414  pA 

Ic(Jp) 

337  pA 

367  PA 

Rl 

ion 

12.5  0 

Rc 

60 

60 

Rp 

50 

50 

L. 

3  pH 

3  pH 

Ic 

3  pH 

3  pH 

L3 

1.5  pH 

1.5  pH 

L4 

1.5  pH 

1.5  pH 

Lp 

6  pH 

6  pH 

Fig.  2  Gate-level  implementation  of  a  two-bit  encoder 

a)  two-stage  buffer  (2BUFFER) 

b)  three-input  quasi-XOR  (3XOR) 


three-bit  quantizer,  a  buffer-and-inverter  stage,  and  a  tiire^^ 
binary  encoder.  The  comparatOTS  in  the  quantizer  are  K 
modified  version  shown  in  Fig.  1.  The  single-stage  buffers 
inverters  are  just  AND  and  NAND  gates  reconfigured 
same  comparator  circuit  with  the  two  inputs  connec| 
together.  Finally,  the  three-bit  binary  encoder  is  realized  ^ 
the  two-stage  buffers  and  three-input  XOR  gates  that  are  1 
for  a  two-bit  encoder. 

As  illustrated  in  Fig.  2,  the  three-input  XOR  ga®*  ^ 
actually  “quasi”  in  the  sense  that  they  functimi  correctly  0^ 
the  inputs  are  thermometer-coded.  However,  it  requires  ^ 
three  two-input  NAND  gates  to  implement  this  quasi-XC 
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Fig.  3  Block-diagram  implementacion  of  a  complete 
three-bit  aaalog-to-digital  cooverter 


gate  instead  of  twelve  to  implement  a  pipelined  standard  three- 
input  XOR  gate.  By  using  these  quasi-XOR  gates,  the  circuit 
ccHuplexity  is  greatly  reduced,  especially  in  terms  of  the 
junction  count.  The  circuit  is  further  simplified  by  using  only 
NAND  gates  to  implement  all  the  logic  blocks.  The  whole 
design  could  be  equivalently  designed  with  only  NOR  gates. 

D.  Low-Speed  Performance 

The  complete  three-bit  fully  parallel  analog-to-digital 
converter  shown  in  Fig.  3  has  been  designed  and  fabricated  iot 
low-speed  measurements.  A  clup  photograph  is  shown  in 
Hg.  4.  tmq^ing  one-to-one  every  block  shown  in  Fig.  3.  The 
total  number  of  junctions  used  in  the  quantizer,  the  encoder, 
and  die  whole  ADC  is  SO.  200,  and  320,  respectively. 

Functionality  of  the  circuit  described  in  Fig.  1  bodi  as  an 
comparator  and  as  basic  logic  gates  has  been  successfully 
'^oostrated.  As  mqiected.  all  angle-stage  logic  gates, 
tnchiding  inverters.  AND.  OR,  NAND,  and  NOR,  woik  with  a 
mugin  as  large  as  ±  30%.  More  complicated  logic  gates  that 
■cquire  a  cascade  of  many  sudi  simple  gates,  such  as  NAND 
driving  an  inverter,  two-stage  buffers,  three-input  quasi-XOR 
Sates,  etc.,  have  also  been  tested  and  verified  to  function 
Conectly  even  though  the  margins  beccxne  somewhat  lower 
dian  that  of  single  gates. 

We  have  also  demonstrated  functionality  tests  tm  a  two- 
1^  quantizer  and  a  two-bit  encoder  separately.  The 
(xperhnental  results  of  the  quantizer  are  shown  in  Fig.  5.  The 
first  two  traces  are  the  two  clocks  indicated  in  Fg.  1.  The  next 


Fig.  4  Chip  photograph  of  the  complete  three-bit 
analog-to-di^tal  converter  shown  in  Fig.  S 


Fig.  5  Measurement  of  a  two-bit  quantizer.  The  traces  are  two 
clocks,  three  inputs,  and  three  ouqiuts,  respectively. 

three  traces  are  three  inputs  which  were  added  to  create  a  rising 
step  analog  signal  Wdi  the  threshold  levels  of  the  three 
comparators  set  at  100  pA  apart,  this  choice  of  inputs  covers 
all  the  posable  combinations.  The  ouqiuts,  shown  as  the  last 
three  traces  m  die  figure,  are  in  the  conect  dwrmometer  code. 

Wb  have  also  succeeded  in  verifying  the  correct  qieration 
of  the  three-bit  themometer-to-binary  encoder.  Figures  6a  and 
6b  show  the  ouqiots  of  the  eocoia  conespooding  to  all 
possible  combinations  of  the  inputs,  as  illustrated  in  a  truth 
table  (Table  2).  In  each  of  the  figures,  die  first  three  traces  are 
the  three-phase  clock  signals,  and  the  last  three  traces  are  the 
three  ouqmts.  D2.  D],  and  Dq.  respectively.  The  ouqmt  in 
Fig.  6a  are  obtained  fa  the  first  four  patterns  in  Table  2;  the 
three  lowest-level  inputs  (1^,  Ij.  and  are  shown  as  the 
middle  three  traces  and  the  other  inputs  (I3  - 1^)  are  all  low.  The 
outputs  in  Hg.  6b  are  obtained  fa  die  last  four  patterns  in 
Table  2;  the  three  highest-level  inputs  (1(.  Is.  and  I4)  are  shown 
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Table  2:  Truth  table  for  a  three-bit  encoder 


l6 

15 

l4 

h 

I2 

It 

lo 

Dj 

Di 

Do 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

1 

0 

0 

0 

0 

0 

1 

1 

0 

1 

0 

0 

0 

0 

0 

1 

1 

1 

0 

1 

1 

0 

0 

0 

1 

1 

1 

1 

1 

0 

0 

0 

0 

1 

1 

1 

1 

1 

1 

0 

1 

0 

1 

1 

1 

1 

1 

1 

1 

1 

0 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

(b) 

Hg.  6  Measurement  of  the  thiec-bit  binary  encoder  for 

a)  ^  first  four  input  patterns  drown  in  Table  2, 

b)  the  last  four  input  patterns  shown  in  Table  2. 
Eram  top  to  bottcm  are  the  three  clocks,  the  three 
irputs.  and  the  three  ouQruts,  respectively. 


as  the  three  middle  traces  and  the  other  inputs  (Iq  - 13)  are  all 
high.  There  is  a  latency  of  2  V3  clock  cycles  due  to  the  pipelirK, 
and  all  the  outputs  ate  correct. 

We  were  able  to  show  that  sabcircuits  of  the  converter, 
including  the  three-bit  binary  eoDodsr,  functioned  correctly.  To 
date,  we  have  not  been  able  to  verify  the  correct  operation  cf 
the  cooqrlete  three-bit  ADC;  possible  reasorrs  include  flux¬ 
trapping.  circuit  defects,  and  the  clock-distribution 
Simulations  with  JSIM  [6]  have  indicated  that  the  converter 
with  a  current  density  of  1000  Ajar?  can  function  at  a  clock 
frequency  as  high  as  5  GHz.  The  current  density  should  be 
increased  to  maximize  the  circuit  bandwidth  and  tnaiyns 

m.  SUMMARY 

Both  comparators  in  a  three-bit  quantizer  and  logic  gates 
in  a  three-bit  binary  encoder  have  been  designed  using  flie 
same  circuit  configuration.  The  circuits  have  been  fabricated, 
and  we  have  been  able  to  demonstrate  experimentally  the 
functionality  of  the  comparators  and  of  logic  gates  at  various 
levels  of  complexity.  We  have  successfully  verified  the  correct 
operations  of  a  complete  two-bit  analog-to-digital  converter 
and  of  a  complete  three-bit  binary  encoder.  Simulations  have 
shown  that  the  complete  three-bit  ADC  can  work  up  to  5  GHz. 
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Abstract- A  superconducting  delta-sigma  A/D  converter  is 
piesented  in  this  paper.  The  converter  uses  a  low-pass  filter 
iistead  of  the  integrator  found  in  the  usual  delta-sigma  archi- 
pcture.  The  converter  is  analyzed  by  a  behavior-level  simula¬ 
tion  package  as  well  as  the  circuit  simulator  JSIM.  Its 
performance  is  compared  to  the  standard  first-order  delta- 
sigma  converter.  The  simulation  shows  that  this  converter  can 
iiiieve  a  70  dB  of  signal-to-noise  ratio  (S/(N+D))  with  an 
oversampling  ratio  of  128.  This  corresponds  to  an  1 1-bit  reso- 
hidoc. 


In  this  paper,  we  begin  with  an  introci  .ction  to  the  prin¬ 
ciple  of  delta-sigma  conversion.  Then  the  Implementation  of 
the  delta-sigma  converter  in  superconducting  technology  is 
analyzed.  We  replace  the  integrator  in  the  usual  delta-sigma 
converter  by  a  low-pass  filter  and  compare  its  performance 
to  the  integrator  converter.  Next,  a  superconducting  circuit 
based  on  this  modified  architecture  is  presented  and  simula¬ 
tion  results  are  given.  This  is  foUowed  by  a  discussion  of  the 
implementation  of  the  superconductive  digital  filter  which 
shows  that  it  is  achievable  within  the  current  superconduct¬ 
ing  technology. 


I.  INTRODUCTION 


n.  DELTA-SIGMA  CONVERSION 


Delta-sigma  A/D  converters  have  been  receiving  much 
ittenfion  lately  due  to  advances  of  modem  VLSI  technology. 
They  inherently  possess  some  characteristics  which  naturally 
1^  themselves  to  VLSI  high-level  integration.  First,  only  a 
small  amount  of  analog  modulator  circuitry  is  required  in  the 
design  and  the  circuits  have  a  high  tolerance  to  component 
mismatching.  This  means  that  component  trimming  is.  not 
lequired  to  achieve  high-resolution  A/D  conversion,  in  contrast 
to  the  strict  component  matching  requirement  for  the  other 
Ugh-resolution  A/D  converters.  In  addition,  the  resolution  of 
delta-sigma  converters  can  be  scaled  directly  with  the  signal 
conversion  rate  through  the  digital  signal  processing  in  later 
Rages.  The  resolution  can  be  increased  by  increasing  the  sam- 
iding  rate.  Furthermore,  delta-sigma  converters’  oversampling 
technique  greatly  relaxes  constraints  on  the  anti-aliasing  filter 
at  the  fi^ont  end;  m  many  cases,  a  passive  RC  filter  will  suffice 
to  replace  the  usual  complex  and  expensive  high-order  analog 
filters  to  filter  out  high  fitequency  noise. 

It  is  a  natural  extension  to  implement  the  delta-sigma  A/D 
^verier  in  high-speed  and  low-power  superconducting  inte- 
Srated  circuit  technology.  The  ultra-high  speed  sampling  capa- 
bility  in  superconducting  circuits  can  be  exploited  to  achieve 
fi>8ber  lesdutioa.  And  their  low  power  coiisumption  may  be  of 
Qitical  importance  in  applications,  such  as  infia-red  image  pro¬ 
cessing,  where  the  power  limitation  obviates  other  technolo- 
S>e$. 

^^esearch  supported  by  the  U.S.  Air  Force  Contract  No.  F19628-90-K- 

and  the  DoD  Univenity  Research  Initiative.  Manuscript 
•eeeived  August  24, 1992. 


A  delta-sigma  converter  consists  of  two  parts  (Fig.  1): 
an  analog  modulator  and  a  digital  decimation  filter  system. 
The  modulator  of  the  delta-sigma  converter  has  its  digital 
output  latched  and  fed  back  to  subtract  from  the  analog  input 
signal.  Thus  its  1-bit  stream  output  digital  signal  y(nT)  (n  is 
the  sequence  number.  T  is  the  sampling  period)  tracks  the 
change  of  the  input  analog  signal;  when  the  analog  signal 
increases.  d(t)  increases  and  the  modulator  produces  positive 
pulses,  which  subtract  from  the  analog  signal  to  make  d(t) 
smaller  and  make  it  tend  toward  producing  negative  pulses. 
The  density  of  the  output  pulses  is  proportional  to  the  input 
amplitude;  and  after  more  processing  in  the  digital  decima¬ 
tion  filter,  the  analog  input  signal  can  be  reconstructed  in  a 
digital  fonn. 


Fig.  1  Structure  of  a  delta-sigma  converter. 
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There  are  two  criteria  for  the  delta-sigma  modulator  to 
operate  correctly.  First,  the  D/A  feedback  signal  has  to  be 
laiger  thmi  the  maximum  input  analog  signal.  Otherwise,  the 
infoimadaa  of  how  large  is  die  peak  analog  signal  is  lost.  Sec¬ 
ond,  the  sampling  fitequency  bas  to  be  much  larger  than  the  sig¬ 
nal  bandwid^  instead  of  just  being  twice  as  the  bandwidth  as 
in  the  other  converters.  Due  to  the  tracking  and  averaging 
nature  of  the  delta-sigma  converter,  the  output  will  be  more 
accurate  if  the  input  signal  does  not  change  much  during  many 
sampling  periods.  Thus  the  delta-sigma  converter  is  often 
refened  to  as  an  oversampling  converter. 

In  an  A/D  converter,  noise  is  introduced  upon  quantizing 
the  analog  signal  into  a  digital  signal.  In  the  delta-sigma  con¬ 
verter  case,  assuming  the  quantizer  generates  white  noise 
whose  rms  value  is  eg  =  A/^/i2,  where  A  is  the  quantizadon 
step  size,  the  total  noise  in  the  base  band  0  is  given  by 

^  =  ^(2/o//,)^-  (1) 


where  is  the  sampling  rate  and  /g  is  the  signal  bandwidth  [1]. 
From  £q.  1  we  see  that  with  an  inaease  of  sampling  frequency 
/^,  the  net  quantization  noise  is  reduced  and  resoludon  there¬ 
fore  increases.  Quantitatively,  with  every  doubling  of  f,.  the 
signal-to-noise  ratio  (S/N)  inaeases  9  dB,  which  corresponds 
to  1.5  bits  of  resolution. 

m.  DELTA-SIGMA  CONVERTER  IN 
SUPERCONDUCTING  TECHNOLOGY 


A  unique  feature  in  implemendng  a  delta-sigma  converter 
in  superconducting  technology  is  the  lack  of  a  high  perfor¬ 
mance  analog  integrator,  which  requires  a  wideband  opera¬ 
tional  amplifier.  The  bandwidth  of  the  amplifier  must  be  at 
least  as  high  as  the  sampling  frequency  in  this  applicadon  [2]. 
But  the  Josephson  junction,  the  active  element  in  supercon¬ 
ducting  techndogy,  is  a  two-terminal,  low-gain  device.  Despi^ 
many  efforts,  there  is  still  no  suitable  wideband  amplifier  based 
on  the  Josephson  junction;  so  some  ways  must  sought  to 
replace  the  integrator. 


Hg.2  Frequency  response  of  (a)  an  ideal  integrator  and  a 
.  practical  int^ator.(b)  a  low-pass  filter.  We  can  see 
tiieir  similarities. 


A  low-pass  filter  has  a  frequency  response  similar 
of  an  integrator.  In  Fig.  2,  the  frequency  responses  of 
order  filter  and  a  practical  integrator  are  compared.  The 
fer  function  of  an  ideal  integrator  has  a  pole  at  zero  and  a^^ 
octave  roll-off.  But  due  to  problems  in  the  implement^  - 
e.g..  finite  gain  and  slew  rate  of  the  amplifier  and  capa^^ 
leakage,  the  transfer  function  always  saturates  at  loi^ 
quency.  The  low-pass  filter  has  a  similar  characteristic  '^ 
difference  between  the  two  is  gain;  the  integrator  has  a  ^ 
higher  gain.  But  the  significance  of  this  difference  is  les^ 
by  the  presence  of  a  quantizer  in  the  following  stageTi! 
quantizer  only  tracks  the  signs  of  the  signal,  not  their  nia&  ' 
tude.  Therefore,  an  ideal  quantizer  will  not  recognize 
difference.  A  low-pass  filter  modulator  would  give  same  dlfiji]  * 
output  as  an  integrator  modulator.  The  nonideality  in  thequ^  ‘ 
tizer,  such  as  its  hysteresis  will  affect  the  output  [2],  butdi^ij- 
the  averaging  nature  of  the  delta-sigma  converter,  a  rar»’'«^,'' 
switching  will  be  averaged  out. 

The  above  principle  is  checked  by  using  a  behavioral-leM : 
delta-sigma  simulation  package-SDSIM  [3].  The  pacbge^^ 
lyzes  the  performance  of  a  delta-sigma  converter  by  mnr^^ 
the  behavior  of  different  modulator  components:  integralaa 
low-pass  filter,  quantizer,  and  D/A  latch.  It  can  also  take  ^ 
account  practical  parameters,  such  as  dithering  of  the  sampj^' 
clock  and  the  hysteresis  of  the  quantizer.  SDSIM  hasiht 
advantage  of  speed  so  that  the  user  can  quickly  find  theid&' 
between  performance  and  circuit  parameters.  Theuse^ifi 
device-level  simulators,  such  as  JSPICE  or  JSIM  to  simiil® 
delta-sigma  converter  is  very  slow  because  each  simuiaboOj 
involves  tens  of  thousands  of  samplings,  thus  millions  of  sw] 
lation  time  steps.  Hence,  tools  like  JSIM  are  only  used  forcoit-l 
firming  the  superconducting  circuit  design  here. 

The  performances  of  the  modulators  with  a  first-order 
pass  filter  and  a  single  integrator  are  compared  in  Fig-|;j|^^ 
these  simulations,  the  input  is  a  sine  wave.  The  oversam^ 
ratio  is  128  and  the  filter  3  dB  frequency  is  1/  (27i  •  64) 


•m  <t9  *10  -10 


•so  <40  ^  «  .10  0 

Input  Signal  (dB) 


Fig.  3  Simulation  results  of  a  low-pass  filter  modulatdU^ 
and  solid  line)  are  compared  to  the  integrator 
tor  (x  and  dashed  line).  The  oversampling  ratio 
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Input  dc  signal  level 

Fig.  4  Digital  output  amplitude  is  plotted  against  the 
analog  input  amplitude.  The  central  platform  is 
enlarged. 

te  tampling  frequency.  We  can  see  that  at  high  signal  levels, 
Ae  low-pass  filter  modulator  has  about  the  same  S/(N  +  D) 
(signal  power/(noise  power  +  harmonic  power)],  but  the  inte- 
gatOT  modulator  has  a  larger  dynamic  range  and  is  more  linear. 
Ibe  Ic'.v-pass  filter  modulator  can  achieve  an  S/Q^  +  D)  of  70 
ffi,  which  corresponds  to  1 1 .4  bits  resoludon. 

The  linearity  of  the  low-pass  filter  modulator  is  further 
smulated  by  feeding  dc  signals  of  various  amplitudes.  In  Fig. 
4,  die  digital  output  is  plotted  against  the  input  signal.  We  can 
M  that  the  modulator  has  an  excellent  integral  linearity.  But 
te  to  the  finite  in  the  low-pass  filter,  the  curve  has  plat- 
iitms  at  certain  input  signal  levels,  as  also  found  in  the  leaky 
Bt^ator  delta-sigma  converter  system.  The  largest  platform 
Bit  zero  input  and  is  shown  in  the  inset  in  Fig.  4.  Ifiis  plat¬ 
form  limits  the  modulator’s  dynamic  range.  When  the 
I,  ■  //  (2n  •  64) .  this  platform  is  0.7%  of  the  peak  input 
Bgoal,  which  means  that  the  dynamic  range  is  45  dB.  This  is 
ibo  indicated  by  the  simulation  with  a  sine  wave  input.  (Fig. 
3).  The  dynamic  range  can  be  improved  by  decreasing  and 
Btmg  a  second-order  low-pass  filter.  We  found  in  simulation 
^  the  dynamics  range  increases  to  56  dB  (9  bits)  when  the 
4  ■ /j/ (2n  *  200)  and  to  62  dB  (10  bits)  when  the 
4«//(27t*400). 


Rg.  5  Siqiercoaducting  circuit  of  a  ddta-sigma 
low-pass  modulator 


A  superconducting  circuit  based  on  the  above  principles  is 
shown  Rg.  5.  The  system  consists  of  input  transformer,  feed¬ 
back  transformer,  low-pass  filter,  QFP  [4]  comparator,  readout 
SQUIDs,  and  RUFFLE  [5]  feedback  D/A  convener.  The  input 
signal  is  coupled  into  the  low-pass  filter  by  the  input  trans¬ 
former.  The  ouqjut  of  the  low-pass  filter  goes  into  die  quantum 
flux  parametron  (QFP)  comparator.  The  comparator  gives  a 
positive  output  when  input  is  positive,  so  the  readout  SQUID 
S,i  will  be  switched  into  the  voltage  state.  This  causes  a  cur¬ 
rent  to  follow  into  the  control  line  of  the  SQUID  in  the 
RUFFLE  circuit.  Thus,  the  current  in  the  RUFFLE’S  output 
inductor  is  from  left  to  right,  and  this  is  coupled  back  into 
the  low-pass  filter  to  cancel  the  effect  of  the  positive  input  cur¬ 
rent  to  the  QFP  comparator.  If  the  input  signal  to  the  compara¬ 
tor  is  negative,  SQUID  5,2  is  switched  into  the  voltage  state 
and  the  feedback  signal  again  cancels  the  mput  signal.  The 
QFP’s  output  switches  between  +1  and  -1.  The  digital  signal 
can  be  read  out  from  resisttxs  /?,,  or 

The  low-pass  filter  is  a  very  critical  component;  a  large  /, 
will  result  in  more  noise.  As  expected,  in  the  simulation,  we 
found  that  when  /,  is  larger  than  f/  (In  •  30) ,  the  S/(N+D) 
decreases  sharply  and  when  /,  is  less  than  f/  (2n»  30) .  the 
S/(N+D)  remains  high  and  is  insensitive  to  /, .  So.  the  trade-off 
is  that  a  large  will  inaease  the  signal  level  after  the  low-pass 
filter  and  ease  toe  comparator  design,  but  decrease  toe  dynamic 
range  and  S/(N+D).  Here  we  chose  to  be  1/  (2Tt  •  64)  of 
toe  sampling  frequency. 

A  QFP  is  used  here  not  only  for  its  capability  of  distin- 
quishing  bipolar  signals  and  its  ultra  high  speed,  but  also  for  its 
extremely  high  sensitivity,  because  the  signal  after  the  low- 
pass  filter  is  quite  small.  A  QFP  can  resolve  signals  down  to  a 
fiew  microamperes.  The  RUFFLE  circuit  employed  here  is  also 
very  critical  to  the  circuit  performance.  The  converter’s  peak 
input  signal  is  limited  to  toe  RUFFLE  circuit  ouqjut  level;  this 
and  the  level  of  the  noise  floor  determine  the  maximum 
dynaiziic  range  of  the  converter,  and  it  cannot  be  improved  by 
faster  sampling.  In  the  simulation  presented  above,  only  the 
quantization  noise  is  taken  into  account. 


Rg.  6  The  output  distal  signal  after  the  filter  qjeratioa 
compared  to  ^  input  sine  wave. 
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The  ciicuit  was  simulated  by  JSIM.  hi  the  simulation,  die 
sampling  firequency  was  1  GHz  and  die  oversampling  ratio  was 
128.  So  the  signal  bandwidth  was  limited  to  4  MHz.  The 
amplitude  of  the  input  signal  was  20  dB  down  from  the  HUF- 
FLE  ciicuit  feedbadr  cunent,  and  the  analog  input  fioquency 
was  IMHz.  The  output  signal  after  the  decimation  filter  is 
compared  to  the  input  sine  wave  in  Fig.  6  to  illustrate  the  cor¬ 
rect  operation  in  the  domain.  The  minimum-sinusoidal- 
error  analysis  shows  diat  it  reaches  a  S/CN+D)  of  53  dB  and 
S/N  of  SS  dB.  which  is  very  close  to  the  value  obtained  from 
die  SDSIM  program  at  die  same  input  signal  level  (Fig.  3). 

IV.  DIGITAL  DECIMAnON  FILTER  IN 

SUPERCONDUCTOR  TECHNOLOGY 

The  decimation  filter  is  a  very  important  part  of  the  delta- 
sigma  converter.  The  decimation  filter  in  the  delta-sigma  con¬ 
verter  serves  four  purposes:  suppressing  the  out-of-band  high 
frequency  quantization  noise;  preventing  the  aliasing  of  the 
out-of-band  signal  into  the  passWd;  maintaining  the  passband 
ripple  within  requirements;  and  down-sampling  the  output  sig- 
nd.  Depending  on  their  applicadons  and  the  structures  of  the 
analog  front  ends,  many  different  filter  implementadons  are 
applicable.  For  most  applicadons,  a  cascade  of  a  linear-phase 
sine  FIR  [6]  filter  and  an  HR  [6]  low-pass  filter  can  achieve  the 
specificadons.  The  delta-sigma  converter  became  feasible  only 
liter  integrated  circuit  technology  was  mature  enough  to  sup¬ 
port  the  complex  design  of  the  digital  filter,  frt  this  secdon,  we 
will  show  that  the  current  superconducting  integrated  circuit 
technology  also  can  support  decimadon  filter  implementadon. 

An  FIR  filter  is  used  at  the  first  stage  mainly  because  of 
the  hardware  simplicity.  The  ouq>ut  of  the  modulator  is  a  one- 
bit  signal  so  that  the  muldpliers  in  the  FIR  filter  can  be  simpli¬ 
fied  as  AND  gates.  For  a  first-order  delta-sigma  converter,  a 
second-order  sine  filter  is  sufficient.  It  down-samples  the  signal 
to  twice  the  Nyquist  frequency  and  leaves  the  following 
harper  HR  filter  to  fiiush  rest  of  the  decimadon.  Due  to  the 
decimadem  and  the  simplicity  of  a  second-order  sine  filter,  the 
hardware  can  be  muldplexed  to  further  reduce  its  complexity 
and  a  very  simple  circuit  implementadon  is  available  [7].  For 
that  circuit  design,  widi  the  filter  coefficients  12-bit  wide  and  a 
decimadon  radd  of  64  (128  taps),  we  estimated  that  it  will  take 
about  2000  MVTL  gates. 

For  an  applicadcn  where  the  phase  linearity  of  the  output 
signal  is  not  important,  an  HR  low-pass  filter  is  applied  as  a 
second  stage  to  remove  the  remaining  out-of-band  noise  and 
further  down-sample  die  signal.  Because  the  ellipdc  filter  has 
the  narrowest  tranadem  band  among  all  filters  of  same  order,  a 
fburth-cx'der  ellipdc  filter  is  used  in  our  simuladon.  Since  the 
filter  coeffidents  are  predetermined,  an  area-efficient  architec¬ 
ture.  called  bit-serial  implementadon.  is  adopted  [8].  In  this 
architecture,  all  addidons  and  muldplicadons  are  serially 
imptowented  in  a  bit-by-bit  c^adon.  Though  it  is  slower  dian 
die  parallel  implementation,  the  signal  rate  is  already  reduced 
by  ffie  FIR  and  the  bit-serial  architecture  can  be  used  here  to 


save  area  and  circuit  complexity.  After  considering  the  coe£G. 
cient  quantization  noise  effect,  16-bit  wide  coefficients  are 
enough  for  a  12-bit  converter.  We  estimate  that  a  fourth-order 
ellipdc  filter  will  consist  of  2000  MVTL  gates.  Thus,  the  total 
filter  requires  4000  gates  and  approximately  12,000  junedoos,' 
which  is  within  the  limit  of  current  Josei^on  technology. 

V.  CONCLUSION 

A  modified  delta-sigma  converter  architecture  is  presented 
wherein  the  integrator  in  the  analog  modulator  is  replaced  by  a 
first-order  low-pass  filter.  This  enables  the  delta-sigma  aichi- 
tecture  to  be  implemented  in  superconducting  circuit  technol¬ 
ogy.  A  superconducting  delta-sigma  analog  modulator  was 
presented.  Simuladon  shows  that  the  converter  can  achieve  a 
11-bit  resolution  (70  dB  in  peak  S/(N+D))  with  an  oversam¬ 
pling  ratio  of  128.  The  signal  bandwidth  is  4  MHz  if  the  sam¬ 
pling  rate  is  IGHz. 
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Abstract-  A  new  voltage  state  multiple  input  NOR  gate 
has  been  designed  and  tested  for  use  as  the  basic  gate  in  a  5-32 
bit  parallel-input  decoder.  Two  versions  of  this  NOR  gate  are 
presented,  one  with  a  single  output  and  one  with  a  selectable 
output.  The  combination  of  the  two  types  of  NOR  gate  makes 
it  possible  to  construct  a  S-32  bit  decoder  with  considerably 
less  gate  current  than  would  be  required  if  it  were  constructed 
in  other  logic  families.  Since  only  a  single  gate  current  is 
required  by  each  NOR  gate,  and  because  only  12  NOR  gates 
are  needed  to  build  the  fhll  decoder,  a  clock  with  a  peak  current 
level  of  only  6  mA  is  sufficient  to  power  all  of  the  decoder’s  72 
constituent  SQUIDs.  The  decoder  also  occupies  a  small  area 
compared  with  other  designs.  In  this  paper  we  review  critical 
design  issues  of  the  NOR  gates.  VVe  also  present  low-speed  and 
high-speed  results  of  sub-blocks  of  the  full  5-32  bit  decoder. 

L  INTRODUCTION 

In  an  application  such  as  a  superconductive  crossbar 
switch  (a  massive  switching  network),  about  100  superconduc¬ 
tive  decoders  must  receive  ac  clock  power,  each  from  a  sepa¬ 
rate  transmission  line.  Each  of  these  lines  originates  m  room 
temperature  environment  and  then  descends  into  a  4  K  dewar 
to  the  decoders.  To  minimize  the  heat  flow  from  room  tempera¬ 
ture  into  the  4  K  dewar.  each  transmission  line  must  have  a 
small  cross  section.  Tlus  constraint,  together  with  the  techno¬ 
logical  limit  of  how  thin  the  transmission  line  conductors  and 
dttlectrics  can  be  made,  makes  it  difficult  to  form  transrhission 
lines  with  low  characteristic  impedances.  Thus,  room-tempera¬ 
ture  voltage  drivers  can  drive  only  a  limited  amount  of  ac  (~I 
GHz)  current  through  each  transmission  line.  Consequently, 
tbe  amount  of  allowable  gate  current  for  each  decoder  must  be 
kqtt  small.  We  present  a  voltage-latching  decoder  architecture 
which  consumes  far  less  gate  current  than  if  it  were  designed  in 
other  voltage  latching  logic  families.  We  accomplish  this  task 
with  two  new  special  purpose  circuits:  a  NOR  gate,  and  a  NOR 
gate  with  a  selectable  current  output.  Because  of  the  simplicity 
of  our  decoder  design,  only  a  small  chip  area  is  required 
(~400  |un  X  1500  pm)  per  decoder.  This  compact  design  is 
particularly  important  in  a  crossbar  application  where  we  must 
fit  32  5-32  bit  decoders  on  a  1  cm^  cldp. 

n.  DESIGN  OF  THE  3-INPUT  NOR  GATE 
^  Description  of  the  3-input  NOR  gate 


^gaie2 


Fig.  1  Basic  block  of  the  5-32  bit  decoder.  Left  stack  of  four 
SQUIDs  is  3-Input  NOR  gate.  Right  stack  of  ten 
SQUIDs  is  2-laput  NOR  gate  with  8  channels  for  cur¬ 
rent  select  output. 


The  3-input  NOR  gate  which  is  used  eight  times  in  our 
”^^er  is  schematically  represented  by  the  stack  of  four 
fiQUiDs  shown  on  the  left  hand  side  of  Fig.  1.  Note  that  the 


Fig.  2  Operation  of  the  3-Input  NOR  gate  of  Fig.  1  for  the 
case  A=l,  B=0,  C=0.  (a)  Threshold  curve  for 
SQUID  corresponding  to  A  input,  (b)  I-V  character¬ 
istic  of  same  SQUH)  with  load  resistor  Rj .  (c) 
Threshold  characteristic  for  SQUID  with  SET  mpuL 
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four  SQUIDs  are  stacked  together  in  series.  The  control  line  of 
each  SQUID  is  uniquely  connected  to  each  one  of  the  iiq)uts  A, 
B.  C,  and  SET.  Figure  2  contains  three  plots  which  describe  the 
operation  of  this  gate.  Figure  2a  shows  a  SQUID  threshold 
curve  corresponding  to  the  SQUID  with  the  A  input.  Figure  2b 
shows  the  I-V  characteristic  of  that  SQUID  together  with  the 
resistive  load  line  R^.  Hgure  2c  shows  the  threshold  curve  of 
the  SQUID  cocrespooding  to  the  SET  input 

We  will  explain  how  the  NOR  gate  works  for  the  input 
case  A=l,  B^O,  C=0.  First  a  gate  current  I„(gi  less  than  the 
critical  currents  of  any  of  the  SQUIDs  is  appu^  to  the  stack  of 
four  SQUIDs;  this  is  represented  by  point  a  in  each  of  the  three 
plots  in  Fig.  2.  Next,  the  inputs  are  applied.  In  this  case,  input 
current  is  applied  to  input  A  only.  As  a  consequence,  the 
SQUID  corresponding  to  the  input  A  enters  the  voltage  state, 
as  can  be  seen  in  Fig.  2a  and  2b  at  point  Note  that  at  this 
time,  most  of  the  gate  current  Ig,(ci  is  switched  out  into  the 
resistor  Rp  Thus,  only  a  small  amount  of  current  flows  into  the 
gate  of  the  bottom  SQUID  of  the  stack  so  that  later,  when  the 
SET  control  line  is  applied,  that  SQUID  remains  in  the  zero- 
voltage  state  and  no  current  flows  into  R2.  Point  y  of  Fig.  2c 
shows  that  the  SET  signal  application  can  at  most  force  the 
bottom  SQUID  to  do  a  vortex-to-vortex  transition,  but  the 
SQUID  caimot  do  a  vortex-to-voltage  state  transition  as  long 
as  the  gate  current  to  the  SQUID  is  kept  small  [1],  The  absence 
of  current  through  R2  represents  a  logical  zero  at  the  output  of 
the  gate.  The  gate  is  reset  by  turning  off  the  gate  current  Igatei- 


Now  consider  the  input  case  A=0.  B=0.  C=0.  Figure  3 
shows  (a)  the  SQUID  threshold  curve  and  (b)  I-V  characteris¬ 
tic  for  the  bottom  SQUID  of  the  stack.  Again  when  the  gate 


Fig.  3  Operation  of  the  3-Input  NOR  gate  of  Fig.  1  for  the 
case  A=0,  B=0,  C=0.  (a)  Threshold  curve  for 
SQUID  corresponding  to  SET  input,  (b)  I-V  char¬ 
acteristic  for  the  same  SQUID  with  load  resistors 
RjURj. 


current  is  applied,  operation  is  at  point  a  shown  in  Fig.  3. 

Since  none  oi  the  fluee  inputs  is  t^lied,  the  gate  current  L.,., 
is  not  diverted  into  Rj,  and  operation  continues  to  resitte  at 
point  a.  Next,  the  SCT  current  is  applied,  and  the  bottom 
SQUID  is  forced  into  the  voluge  state,  as  shown  by  point  fl. 
Since  the  SQUID  is  loaded  by  ^th  of  the  resistors  Rj  and  R2 
in  parallel,  the  load  line  intersects  the  SQUID  I-V  characteris¬ 
tic  deep  into  tl»  subgap.  Thus,  gate  current  Ig^^i  flows  partly 
through  Ri  and  partly  through  R2.  The  current  which  passes 
throng  R2  represents  a  lo^cal  “1"  at  the  output  From  these 
two  examples  of  iiqruts.  it  is  clear  that  the  gate  performs  the 
NOR  funcdoo:  a+b+c. 


B.  Choice  of  values  for  Rj  and  R2 

As  was  previously  mentioned,  a  SQUID  l-V  characteristic 
which  is  representative  of  the  SQUID  corresponding  to  the 
input  A  is  shown  in  Fig.  2b.  Note  that  the  same  I-V  characteris¬ 
tic  applies  to  the  two  SQUII>$  in  the  middle  of  the  stack  should 
the  B  or  C  inputs  be  applied.  If  any  of  the  inputs  A.  B,  or  C  is  a 
“1”.  then  Ri  must  be  small  enough  to  guarantee  that  most 
(-80%)  of  the  gate  current  is  shunted  away  from  the  bottom 
SQUID  so  that  the  latter  will  not  enter  the  voltage  state  upon 
application  of  the  SET  control  current.  The  question  remain^  u 
to  how  R2  should  be  chosen.  If  A,  B,  and  C  are  all  “0",  then 
upon  the  application  of  the  SET  signal,  at  least  half  of  the  gate 
current  should  leave  through  R2  to  insure  that  it  can  drive  the 
input  of  the  next  gate.  Thus,  R2  should  be  chosen  to  be  less 
than  or  equal  to  R].  This  choice  of  R2  makes  the  parallel  resis¬ 
tance  R,  II  R2^R,/2,  and  it  is  the  reason  that  the  bottom 
SQUID  latches  deep  into  the  subgap  portion  of  the  I-V  charac¬ 
teristic  as  shown  in  Fig.  3b.  To  assure  a  latching  current  output 
for  the  resistor  R2.  R)  II R2  must  not  be  so  small  that  the  bottom 
SQUID  resets;  a  sensible  choice  is  R|  =  R2. 

C.  The  need  for  additional  flicx  gain 


It  was  mentioned  that  at  least  half  of  the  gale  current  must 
leave  through  the  output  of  the  NOR  gate  to  “drive"  the  input 
of  next  gate.  In  theory,  even  a  tiny  amount  of  input  current  can 
switch  a  SQUID.  In  practice  however,  a  large  amount  of  output 
current  is  needed  to  guarantee  that  the  SQUID  will  switch  over 
a  wide  range  of  applied  gate  currents.  One  of  the  advantages  of 
the  stacked  NOR  gate  design  is  that  each  SQUID  has  a  angle 
input.  This  makes  it  feasible  to  use  a  transformer  coupliqg  fct 
each  SQUID  with  a  turns  ratio  of  2: 1 .  The  2: 1  transformer’ 
amplifies  the  external  flux  mput  to  each  SQUID  by  nearly  a 
factor  of  two  and  it  is  equivalent  to  doubling  the  drive  current 
from  the  previous  stage.  / 


ffl.  FUNCTION  OF  BASIC  DECODER  BLOCK 


As  was  mentioned  earlier,  the  5-32  bit  decoder  consists 
two  kinds  of  NOR  gates.  The  single  ouqrut  NOR  gate  was  dhj," 
cussed  in  Section  H.  It  consists  of  four  SQUIDs  and  is  sh0|«. 
in  the  left  side  of  Fig.  1 .  The  second  kind  of  NOR  gale 
of  a  stack  of  ten  SQUIDs  and  it  is  shown  in  the  right  side  <x 
Fig.  1.  This  NOR  gate  is  different  from  the  previopfe 
described  gate  only  in  that  its  output  can  be  selected  tbroa^ 
any  one  of  the  eight  output  resistors  labelled  Routt 
Routs-  Figure  1  shows  the  intercoimection  of  each  of  these  tgo.^ 
kinds  of  NOR  gates.  We  call  this  interconnection 
decoder  block  since  the  functitmality  of  this  block  is  essei^' 
to  the  functionality  of  the  full  5-32  bit  decoder. 
decoder  block  operates  as  follows:  First  current  is  applie®.@H 
Igotei  and  Lj^2-  Next,  any  one  of  five  inputs  A,  B.  C.  D. 
is  applied.  As  was  mentioned  earlier,  if  none  of  the  inputs  A^ 
or  C  is  applied,  then  upon  application  of  a  SET  signal, 
is  steered  through  R2  into  the  input  of  the  SQUID  in 
stack  corresponding  to  Rolt4-  ^  neither  the  D  nor  the  E 
is  applied,  t^  this  SQUID  switches  into  the  voltage 
the  gate  current  diverted  around  the  SQU® 

corresponding  ouqmt’resistor  Rou,4.  Current  continues 
throu^  the  gates  of  the  other  seven  SQUIDs  so  only  IW 
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dMt  selected  output  Conseciuentty.  the  basic  decoder  block  per-  mum  critical  current  with  damping.  This  peak  limits  the  gate 
jjonsihefuoctioo:  Rout.4  -  A+B  +  C-D+E.  margin  of  the  NOR  gate. 


IV.  ijOW-speed  experimental  results 

fhe  resonance  problem 

hi  lestiiig  the  single  input  NOR  gate  it  was  noticed  that  the 
guQittt  (bottom)  SQUID  could  latch  into  the  voltage  state  at  a 
voltage  which  was  considerably  smaller  than  expected.  The 
tfoblem  was  found  to  occur  for  small  values  of  the  gate  cur- 
}fpt  Hguie  4  illustrates  two  modes  of  operation  of  the  NOR 


Fig.  4  I-V  characteristic  of  the  bottom  SQUID  of  the 
single  output  NOR  gate.  Intersection  of  the 
resistive  load  line  with  the  resonance  peak  of 
the  SQUID  is  shown. 

gate:  one  in  which  the  bottom  SQUID  latches  into  the  subgap 
as  shown  by  point  B  (proper  operation)  and  one  in  which  the 
SQUID  latches  into  a  resonance  as  shown  by  point  A  (failure). 
Resonances  in  SQUIDs  are  caused  when  an  oscillation  is  set 
ap  between  the  capacitance  of  the  junctions  and  the  loop 
inductance  [2].  These  oscillations  can  only  occur  when  a 
SQUID  loop  contains  an  amount  of  flux  other  than  an  integral 
multiple  of  a  Sux  quantum  <I>g  .  In  Fig.  4.  we  apply  an  amount 
of  external  flux  through  the  bottom  SQUID  equal  to  about 
^0^  •  an  amount  of  flux  at  which  the  resonance  is  most  pro¬ 
nounced.  The  resonance  peak  of  the  bottom  SQUID  of  the 
NOR  gate  is  shown  as  a  curved  line  in  Fig.  4  next  to  point  A. 
Ihe  load  line  must  avoid  the  resonance  peak  to  ensure  proper 
operation  of  the  gate.  Load  line  #2  of  Fig.  4  intersects  that  res- 
anance  peak  since  the  initial  gate  current  is  not  large  enough. 
Bowever,  load  line  #1  has  a  sufhciently  large  initial  gate  cur- 
Rui  CO  bypass  the  resonant  peak.  The  initially  applied  gate  cur- 
Rots,  for  successful  operadcm  of  the  NOR  gati..  must  be  larger 
Rian  die  gate  current  ctxiesponding  to  pomt  C  and  must  be  less 
dim  the  maximum  critical  current  of  the  SQUID  (point  D). 
^liis  difference  should  be  as  large  as  possible  to  maximire  the 
RaR-current  margiiL  Also.  R^  and  R2  of  the  NOR  gate  should 
Ir  as  large  as  posable  so  that  the  load  line  R]  II R2  mil  be  less 
likely  to  intersect  the  resonance  peak.  In  any  case,  R1  and  R2 
Rust  meet  die  condidons  discussed  in  secdm  IL 

B.  Solution  to  ihe  resonance  problem 

One  way  to  ameliorate  the  resonance  problem  is  to  sup- 
^s  (he  height  of  the  resonance  peak.  This  is  ^ically  done 
by  placing  a  daoqiing  rector  across  the  SQUID’s  loop  induc- 
Unce.  This  damping  resistor  helps  to  reduce  the  height  of  the 
P^.  but  cannot  dkiinate  it  Typically,  the  minimum  height  of 
dm  resonant  peak  is  approximately  40%  of  the  SQUID’s  maxi- 


We  present  a  circuit  solution  to  avoid  getting  caught  in  a 
resonance;  drive  the  SET  line  d  the  basic  gate  with  a  pulse  of 
current  instead  of  a  step  current.  The  idea  stems  from  the  fact 
that  a  resonance  can  exist  only  as  long  as  external  flux  is 
present  in  the  SQUID.  By  sending  a  pulse  of  current  into  the 
SQUID,  the  resonance  can  only  live  for  a  short  time,  and  even¬ 
tually  the  SQUID  must  latch  into  the  subgap.  In  our  basic 
decoder  block,  the  output  current  from  the  resistor  R2  latches, 
thus  generating  a  sustained  external  flux  in  the  SQUID  corre¬ 
sponding  to  Rout4  which  can  cause  that  SQUID  to  latch  into  a 
resonance.  This  is  avoided  by  preventing  the  current  in  R2 
from  latching  by  choosing  R2  to  be  small.  We  denote  this  ver¬ 
sion  of  the  decker  block  with  R2  small  as  the  “nonlatching” 
version. 

C.  The  basic  decoder  block 

A  version  of  the  basic  decoder  blcxrk  shown  in  Fig.  1  with 
fewer  output  SQUIDs  was  Srst  fabricated  and  demonstrated  in 
our  laboratory.  Subsequently,  the  full  decoder  block  of  Fig.  1 
was  fabricated  by  Hypres  Inc.  Here  we  present  results  on  the 
nonlatching  version  of  the  basic  block,  as  discussed  in  the  pre¬ 
vious  section  although  the  latching  version  was  also  fabricated 
and  tested.  Figure  3  shows  oscilloscope  traces  of  input  and  out¬ 
put  waveforms  of  the  test  on  the  nonlatching  version.  Signals 
were  applied  to  the  A,  D,  and  E  inputs  which  can  be  seen  on 
lines  3. 4,  and  5  respectively  from  the  top  of  the  photograph. 
The  gate  currents  I„^|  and  Ig,u,2  were  applied  simultaneously 
(sixth  line  of  Fig.  5).  The  second  line  shows  the  SET  pulses 
which  are  intentionally  delayed  to  arrive  after  the  inputs.  The 
output  at  RQm4  can  be  seen  on  the  first  line.  Note  that  this  test 
of  the  basic  block  was  done  for  eight  different  sets  of  inputs. 
The  basic  block  was  successfully  demonstrated  in  that  the  out¬ 
put  Rou,4  was  a  “1”  only  when  all  three  of  the  outputs  were 
“0”.  Thus  successful  operation  was  demonstrated.  Experimen¬ 
tally.  it  was  found  that  the  gate  currents  L,[;|  and  could 
be  simultaneously  varied  by  about  +/-  23%.  Simulations  of  the 
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circuit  suggest  that  the  maigins  should  be  about  -f/-  30%.  Noa- 
unifotmides  of  +/•  7%  in  ^  maximum  critical  current  of  the 
SQUlDs  are  responsible  for  this  discrepancy.  These  nonunifor- 
mities  determine  the  maximum  current  L^i  and  Ig3ie2  which 
can  be  applied  to  the  NOR  gates  and  thus  limit  the  upper  end  of 
the  gate  margin.  The  minimum  amount  of  allowed  current 
Ig...i  and  is  determined  by  the  size  of  the  pulse  which 
leaves  R2.  if  this  pulse  is  not  large  enough,  it  will  not  be  able  to 
drive  the  next  stage.  The  experimentally  observed  minimum 
allowed  gate  current  has  been  shown  to  be  in  good  agreement 
with  simulaticQ  provided  that  the  simulation  takes  into  account 
the  transmission  line  connecting  the  output  of  R2  to  the  input 
of  the  next  stage.  The  gate  margin  of  the  latching  version  of 
our  decoder  block  was  experimentally  observed  to  be  +/- 18%. 
In  these  observations  it  was  found  that  below  a  certain  gate 
current  value  the  output  resistor  Rout4  would  latch  into  a 
smaller  voltage  than  was  expected.  Simulation  showed  that  our 
heavily  shunted  output  SQUID  corresponding  to  Rout4  was 
actually  latching  into  a  resonance  instead  of  into  the  subgap  as 
was  explained  in  secdon  IV.  In  this  simulation,  parasitic  induc¬ 
tance  associated  with  the  damping  resistors  of  the  SQUIDs  is 
essential.  As  we  had  expected,  resonances  at  the  output  resistor 
Rout4  nonlatching  version  were  not  observed.  An 
increase  of  the  gate  margins  could  be  made  by  increasing  the 
critical  current  of  the  single  output  NOR  gate,  and  clocking 
Ijaiei  at  a  higher  level  than  Igatei-  This  would  increase  the  size 
of  &e  current  pulse  which  leaves  R2.  However,  it  would  also 
increase  the  decoder's  clock  current. 

D.  Full  5-32  bit  decoder 

Figure  6  shows  a  circuit  schematic  of  the  entire  5-32  bit 
decoder.  The  basic  decoder  sub-block  (in  Fig.  1)  is  outlmed  by 
the  two  dotted-line  boxes.  The  full  decoder  is  constructed  &om 
the  interconnection  of  eight  of  the  single-output  NOR  gates 
with  four  of  the  selectable-output  NOR  gates  as  shown  in  the 
figure.  The  8  SET  inputs  corresponding  to  each  of  the  single 
ou^ut  NOR  gates  are  coimected  in  series.  Note  that  there  are  3 
inputs  for  each  of  these  8  gates  conesponding  to  a  total  of  24 
inputs.  These  24  inputs  are  subdivided  mto  six  sets  of  4  serially 
connected  inputs,  ^h  of  these  six  sets  of  inputs  is  driven  by  a 
decoder  address  line.  There  are  10  decoder  address  lines  con¬ 
sisting  of  the  5  decoder  addresses  and  their  inverses.  Similarly, 
the  8  inputs  cotrespmiding  to  the  4  multiple  ou^ut  NOR  gates 
ate  connected  in  4  set',  of  2  serially  connected  inputs.  Each  of 
these  4  inputs  is  dri'  en  by  the  4  remaining  decoder  address 
lines.  Con^tirms  are  made  such  that  there  exists  a  unique  out¬ 
put  (current  through  one  of  the  32  output  resisttns)  for  each  of 
the  32  possible  sets  of  input  addresses.  The  output  is  generated 
upon  ^  applicatian  of  ^  SET  signal. 


fabricated  by  Hypres  Inc.  and  tested  at  Berkeley.  Due  to  wiring 
errors,  only  16  of  the  32  outputs  funcdoned  correctly*. 


Fig.  6  Circuit  schematic  of  a  full  5-32  bit  decoder.  Basic  ' 
decoder  block  (in  Fig.  1)  is  represented  here  by 
two  NOR  gates  enclosed  by  dotted-line  boxes. 

V.  HIGH  SPEED  EXPERIMENTAL  RESULTS 


A  test  of  the  nonlatching  version  of  our  basic  block  similar 
to  that  of  Section  IVc  was  conducted  at  high  speed.  The  gate, 
inputs,  and  SET  signal  were  applied  current  steps  separated  by 
approximately  2  ns.  The  gate  margin  for  the  block  was 
decreased  to  about  +/- 15%  at  high  speed.  The  gates  could  only 
be  reset  every  40  ns  due  to  limitations  in  our  test  setup  so  that 
resetting  spe^s  of  the  gates  could  not  be  checked. 


VI.  CONaUSIONS  .IJ 

'(v’j 

We  have  presented  a  novel  decoder  to  be  used  in  a  aoss- 
bar  switch.  Resonances  were  shown  to  be  a  critical  issue  in  qri. 
design.  We  propose  a  nonlatching  version  of  our  decoder  to, 
avoid  resonances.  A  2: 1  transformer  coupling  for  each  SQl^, 
was  found  to  increase  the  gate  margins.  The  basic  decoder 
block  was  shown  to  work  at  both  low-speeds  and  high-speeds 
(with  the  exclusion  of  a  test  of  the  resetting  speed). 
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As  was  mentioned  earlier,  it  is  of  critical  importance  that 
the  decoder  consume  a  small  ac  clock  current  The  72  SQUIDs 
of  the  decoder  ate  ccmtained  in  12  gates.  If  each  SQUID  has  a 
maximum  critical  current  of  500  pA,  then  the  full  decoder 
consumes  about  6  mA  of  ac  dock  current  Thus  the  decoder  is 
very  efficient  in  its  use  of  current  biases.  A  full  decoder  was 


*Note  added  in  proof:  the  corrected  version  of  the  decoder  was  fabri¬ 
cated  and  we  have  demonstrated  that  all  32  outputs  function  correctly. 
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Abstract  We  report  simulation  studies  on  several 
novel  concqrts  in  superconductive  signal  processing 
circuits,  including  an  advanced  shift  register  design 
and  innovative  serial  decoder  concepts.  The  shift 
register  is  a  flux-shuttle  with  very  large  operadng 
margins.  The  decoder  implements  a  new  serial 
approach  using  shift  registers  and  a  single 
comparator  for  each  output 

1.  Introduction 

Multi-gigahertz  digital  signal  processing  requires  very 
high-speed  analog-to-digital  converters,  storage 
elements,  and  logic.  Superconductive  circuits  can 
provide  the  high  speed  necessary  to  implement  these 
components.  In  this  paper,  we  describe  a  compact 
high-speed  flux-shutde  shift  register  as  a  storage 
element,  and  a  serial  decoder,  In  addition,  these  circuits 
have  the  advantage  of  being  compatible  with 
nonhysleretic  junctions,  making  them  good  prospects  for 
use  with  high-T«  superconductors. 

2.  Flux-Shuttle  Shift  Register 

A  high-speed,  low-power,  miniature  shift  register  is 
desirable  for  a  variety  of  digital  signal  processing 
applications.  One  application  might  employ  an  A/D 
converter-shift  register  combinadon  to  tw  used  as  a 
muld-gigahertz  sample-and-hoid  circuit.  Our  design 
goals  are  wide  operadng  margins,  shifdng  speeds  in 
excess  of  20  GHz,  compadbility  with  current 
superconduedve  A/D  converter  designs,  and  high-speed 
testability.  These  design  studies  are  restricted  to 
magnedc  flux  storage  circuits  to  maximize  speed,  to 
allow  compadbility  with  high-T^  superconductors,  and  to 
minimize  power  consumpdon.  Various  designs  have 
been  examined  [I][2][3][4]  with  moct  attendon  focused 
on  the  flux  shutde  type  [3][4]  and  one  udlizing  Rapid 
Single  Flux  Quantum  (RSFQ)  logic  [2]. 

An  RSFQ  shift  register  has  many  advantages.  Perhaps 
the  biggest  of  these  is  the  simple  clocking  scheme  that 
can  be  used  -  a  traveling  Single  Flux  Quantum  (SFQ) 
pulse.  The  simple  architecture  of  a  shift  register  makes 
this  circuit  a  prime  candidate  for  such  a  clock 
distribution,  whereas  a  more  complex  circuit  would  have 
to  deal  with  muldple  paths  and  dimensions.  These 
clocks  can  be  provided  at  high  speed,  and  the  number  of 
pulses  can  be  accurately  controlled.  This  allows  very 


high  q)eed  test  capability  without  high  speed  signal 
lines  ftom  external  circuits,  since  the  clock  can  be 
generated  on-chip.  Read-out  can  easily  be  accomplished 
through  a  simple  two-junedon  SQUID  and  can  be  either 
voltage  or  current  state.  The  margins  for  this  circuit  arc 
also  very  good. 

The  disadvantage  of  the  RSFQ  shift  register  is  its 
incompadbility  with  the  A/D  convener  chosen  for  our 
system.  Although  A/D  converter  circuits  have  been 
proposed  which  are  compatible  with  this  type  of  shift 
register  [2],  we  are  integradng  the  shift  register  with  a 
flash-type  A/D  converter  developed  in  our  laboratory 
[S].  This  A/D  converter  requires  the  use  of  a  three- 
phase  sinusoidal  overlapping  clock.  An  SFQ  pulse 
generator  can  be  used  to  conven  the  sinusoidal  clock 
into  the  pulses  required  by  the  RSFQ  circuit,  but  in  the 
RSFQ  shift  register,  the  clock  and  data  move  in  opposite 
direcdons.  Since  the  data  must  be  synchronized  with 
the  sinusoidally  clocked  A/D  converter,  there  is  a  data 
flow  dependency  problem  with  this  configuradon. 

The  flux-shuttle  shift  register  proposed  by  Beha  ei  al. 
[4][6]  uses  a  three-phase  sinusoid^  clock  (pig.  la,  lb). 
If  a  flux  quantum  is  stored  in  the  first  cell,  a  clockwise 
circulating  current  exists  through  the  storage  inductance 
(Li)  and  two  adjacent  junedons  (Ji  and  J2).  When  the 
clock  in  the  next  cell  (Cj)  becomes  aedve,  current  is 
coupled  into  J2,  exceeding  its  cridcal  current  and 
injeedng  flux  into  the  two  adjacent  storage  loops.  On 
the  left  side,  this  injected  flux  quantum  cancels  the  one 
stored  in  L],  and  on  the  right  side,  the  flux  quantum  is 
stored  in  L2.  This  effccdvcly  transfers  the  flux  quantum 
from  one  cell  to  the  next.  The  clocking  scheme 
presented  by  Beha  uses  a  clipped,  nonoverlapping 
sinusoid.  Although  the  circuit  was  simulated  at  30  GHz, 
this  type  of  signal  is  very  difficult  to  {xovide  externally. 
We  have  invesdgated  the  use  of  a  low-temperature 
Schottky  diode  on-chip  to  clip  a  three-phase  overlapping 
pure  sinusoid.  However,  for  extensive  clock 
distribudon,  this  circuit  poses  a  difficult  microwave 
transmission  problem  due  to  the  high  frequencies 
associated  with  a  clipped  sinusoid.  An  alternative  is  to 
use  the  unclipped  clock  dircedy  in  the  shift  register. 
Although  this  reduces  the  operadng  margins,  the  circuit 
sdll  performs  well. 

A  potendal  advantage  of  the  flux  shuttle  is  the  very  low 
power  used  in  the  serially-connected,  inducdvcly- 
couplcd  clocking  shown  in  Fig.  la.  This  presents  a 
problem,  however,  since  the  inductance  used  for 
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coupling  to  the  stonge  loop  (L3)  leaves  less  inductance 
available  lo  couple  to  the  lead-out  SQUID  (LJ.  This 
limilation  stems  6om  the  lequiiement  that 
Pl  >  2xLV*^*>  whe  ^LaLj-t-Lpheonthe  older  of 
2ic  for  correct  circuit  operatkm.  The  inductively- 
coupled  clock  lines  also  produce  undesirable  oscillations 
in  the  current  11,  used  lo  read  out  the  state  of  the  cell. 
Various  low-inductive  methods  of  reading  out  the  stored 
state  of  the  cell  were  invesligaled  and  a  reverse-coupled 
dummy  shift  register  was  used  lo  cancel  the  effect  of 
the  cl^  lines  on  the  read-out  SQUID.  Aithoogh  the 
oscillations  were  eliminaied.  the  low-faiductance  read-out 
schemes  still  had  insufBcient  coupling  to  achieve  large 
operating  margins. 


(b) 

Fig.  1.  Flux-ihuide  shift  legisler  with  (i)  ■  three-phase 
inducUvely-ooupled  clock  and  (b)  a  three-phase 
directly-injected  clock.  Read-out  b  done  via  a  two- 
junction  SQUID. 

An  alternative  to  the  inductively  supplied  clock  is  to  use 
direct  injection  (Fig.  lb).  Hiis  circuit  operates  like  the 
previous  one,  excqtt  that  the  injected  clock  bias  current 
is  directly  injected  into  the  junctions.  Thu  form  of 
clocking  flees  all  of  the  storage  loop  inductance  for 
coupling  ta  a  read-out  SQUID  and  greatly  reduces  the 
oscillation  of  the  read-out  control  lines,  thus  improving 
the  read-out  margins.  However,  it  also  presents  a  more 
difficult  clock  distribution  problem  and  ^ditional  power 
dissipation.  Figure  2  shows  a  method  of  clocking  this 
circuit  which  guarantees  correct  output  for  either 
latching  or  nonlatching  read-out.  It  uses  only  two 
storage  loops  per  bit  and  drives  the  read-out  SQUID 


bias  with  one  of  the  three  phases  of  the  clock.  The 
second  storage  loop  is  used  to  couple  to  the  read-out 
SQUID.  This  use  of  the  clock  phases  maintains  the 
correct  logic  value  in  the  cell  throughout  the  read 
operation  (2/3  of  the  clock  cycle),  increasing  the  read¬ 
out  portion  of  the  clock  cycle. 


Fig.  2.  Flux-thuUle  ihift  tegbier  with  a  two-phue  (C|, 
Cj)  directly-injected  clock.  Read-out  b  done  via  a 
two-junedon  SQUID  biased  by  the  Cj  clock  phase. 

We  have  simulated  the  flux  shutUe  shift  register  with 
direct  injection  sinusoidal  clocking  using  JSIM  [7]. 
These  simulations  indicate  that  the  circuit  will  operate 
correctly  at  speeds  in  excess  of  40  GHz.  and  at  20  GHz 
with  voltage-state  read-out  "on  the  fly*  using  heavy 
resistive  shunting  on  the  read-out  SQUID  (Pig.  3). 
Although  the  low  resistance  shunt  reduces  the  output 
voltage,  it  prevenu  the  read-out  SQUID  from  staying  in 
the  latched  voltage  slate  (punchthrough).  Opmting 
margins  were  also  checked  using  the  program  PSCAN 
[8],  which  can  be  used  to  produce  a  two-dimensional 
plot  of  circuit  performance  as  a  function  of  two  circuit 
parameters.  Simulations  indicated  1  S0%  margins  on 
the  storage  loop  inductance,  ±  60%  margins  on  damping 
resistance,  and  ±  15%  margins  on  the  clock  bias.  In 
each  case,  the  margins  are  defined  holding  all  other 
paranteters  at  their  midpoint  values.  The  margins  for  the 
critical-current  allowed  for  as  much  as  ±  10%  shifts 
from  the  design  value  without  adjusting  the  clock  bias. 
In  addition  lo  this,  the  circuit  tolerates  critical  currents 
that  differ  by  30%  in  adjacent  shift  register  cells. 


t.Ot  1*1. U  Ml.ot  in.M 

ns 

Fig.  3.  Voltige-state  read-out  of  the  flux-shuttle  shift 
register  operating  at  30  GHz.  The  two  signab  are 
produced  by  shifting  a  single  "t”  value  through  two 
adjacent  shift  regbter  ceils. 
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Serial  Decoder 

We  will  describe  a  decoder  scheme  which  is  well  suited 
for  flux-based  logic.  As  will  be  shown,  our  design  can 
be  implemented  with  shift  registers  using  XOR  read-out 
gates.  Thus  the  serial  decoder  presented  is  only  slightly 
more  complex  than  a  flux  bared  shift  register.  This 
progression  in  circuit  complexity  provides  a  means  to 
detennine  if  flux-mode  ^ift  registers  can  function 
properly  when  integrated  with  other  circuit  elements. 

The  fundamental  decoding  elements  in  our  design 
include  a  closed-loop  shift  register,  an  XUR  gate  and  a 
latch.  Figure  4  shows  a  block  diagram  of  a  portion  of 
our  decoder.  When  the  first  bit  (least  significant)  of  a 
five-bit  iig>ut  address  to  the  decoder  becomes  valid,  it  is 
simultaneously  compared  (XORed)  with  all  five  cells  of 
a  shift  register.  If  the  result  of  a  given  comparison  b  a 
match,  then  the  XOR  gate  generates  no  output  and  fails 
to  set  its  corresponding  latch.  In  the  case  of  a 
mismatch,  the  XOR  gate  produces  a  T,  retting  its 
latch.  The  shift  regbter  then  shifb  its  data  so  the  bit 
contained  in  the  fifth  cell  b  transfenetf  to  the  first  cell, 
the  bit  in  the  first  ceil  is  transferred  to  the  second,  and 
so  forth.  Again,  the  five  XOR’s  are  performed  in 
parallel  and  each  XOR  cell  has  a  chance  to  set  its 
coaesponding  btch.  We  repeat  this  procedure  five 
limes.  At  this  point,  the  shift  regbter  has  returned  to  its 
initial  stale,  and  an  “empty*  latch  represents  a  five  bit 
match.  A  summary  of  the  the  circuit  output  follows; 

An  input  address  of  00001  will  prevent  latch  VI 
from  becoming  set. 

An  input  address  of  00010  will  prevent  latch  #2 
from  becoming  set. 


Ttius,  we  have  provided  for  five  of  the  32  possible  input 
patterns.  To  generate  logic  for  30  of  there  patterns  we 
need  six  shift  registers,  each  with  five  XOR  gales  and 
five  biches,  performing  the  above  operation.  We  must 
fill  the  shift  registers  as  follows: 

1st  shift  register  -  10000 
2nd  shift  register  •  01 1 1 1 
3rd  shift  register  -  1 1000 
4th  shift  regbter  -00111 
Sth  shift  register  -  1 1010 
6lh  shift  register  -  00101 

The  last  two  of  the  32  decoder  cases  can  be  covered 
widi  two  additional  XOR  gales.  Each  gate  has  one 
input  connected  to  the  input  address  line  and  its  other 
input  connected  to  “1“  or  *0*.  Figure  5  shows  a  block 
diagram  of  the  complete  decoder.  It  should  be  noted 
that  the  number  of  shift  registers  can  be  reduced  to  three 
if  we  use  inverting  shift  registers.  This  decoder  has  the 
following  advantages: 

1)  Only  two  lines  have  to  be  compared  (fan-in  =  fan¬ 
out  =  2)  independent  of  the  number  of  bits  in  the 
serial  decoder.  This  maintains  logic  margins  even 
for  flux-based  logic  where  on-chip  current-driving 
capabilities  are  limited. 

2)  Rux-based  logic  typically  requires  a  clock  for  each 
gale  (e.g.  quantum  flux  parametron  {9)).  Thus, 
decoder  designs  requiring  multiple  levels  of  logic 
would  require  additional  clocking  to  produce  an 
output  Although  our  design  is  implemented  in  a 
single  level  of  logic,  it  still  uses  multiple  clock 
cycles  to  do  the  decode  function.  However,  since 
this  is  done  in  parallel  with  the  input  of  the  serial 
address,  additional  clocking  is  not  necessary. 


An  input  address  of  KXXX)  will  prevent  latch  #S 
from  becoming  set. 


Fig.  4.  Block  diagram  of  portion  of  decoder.  Here  we 
show  five  shift  register  cells  with  their  corresponding 
XOR  gates  and  latches.  XOR  gates  receive  inputs  from 
the  decoder  address  line  and  from  a  shift  register  cell. 


Fig.  5.  Block  diagram  of  the  decoder.  Here  we  show 
six  five  bit  shift  registers  loaded  with  the  appropriate 
data.  Two  XOR  gates  have  been  provided  to  cover  the 
cases  of  address  »  11111  and  address  >  00000 

Figure  6  shows  one  possible  implementation  of  the  shift 
register  and  XOR  gale  using  the  flux-shuttle  shift 
register  proposed  earlier  in  this  paper.  The  XOR  gale  is 
simply  the  sliift  register  read-out  SQUID  with  an 
additional  control  line  (the  decoder  address  line).  The 
latch  is  formed  by  the  loop  containing  inductor  L^uh 
(sec  RSFQ  storage  loop  in  Ftg  6.)  {2].  Clocks  C|,  C2, 
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Fig.  6.  Three  phase  version  of  flux-shuttle  shift  register. 
Here  we  add  an  additional  control  line  to  the  read-out 
SQUID  of  the  shift  register,  creating  an  XOR  gale.  An 
RSFQ  latch  is  employed  for  read-out.  lRe«t-aui  i^  die 
read-out  bias  line.  lAdSnu  is  the  serial  input  address 
line. 


4.  Conclusion 


We  have  presented  simulation  studies  of  key  elements 
from  a  high  performance  multi-gigahertz  digital  signal 
processing  system.  Simulations  of  a  flux-shuttle  shift 
register  with  three-phase  sinusoidal  clocking  show  very 
large  operating  margins  for  shifting  and  read-ouL  The 
design  provides  for  read-out  *on  the  fly"  for  frequencies 
up  to  20  GHz.  A  serial  decoder  implemented  with  a 
single  comparator  per  decoded  output  and  shift  registefs 
has  been  presented.  The  circuit  can  be  implemented  in 
either  voltage  or  current  (flux  transfer)  logic.  The 
compatibilior  with  flux  transfer  logic  in  both  these 
circuits  provides  for  their  implementation  with  high-T, 
superconductors. 
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Absbract-Tbe  prospect  of  picoseccnd  gate  delays,  com¬ 
bined  with  die  peculiarides  of  supetconducdve  digital  circuits, 
pushes  system  architecture  design  for  supercooducdve 
micrt^ocessors  mto  new  ground.  Several  groups  have  pro¬ 
posed  possible  architectures,  including  systems  for  the  quan¬ 
tum  flux  parametron,  for  modified  variable  threshold  circuits, 
and  fot  rapid  single  flux  quantum  devices.  These  architectures 
are  representative  of  systems  that  are,  respectively,  synchro¬ 
nous  on  the  gate  level,  synchronous  with  ripple  logic  capabil¬ 
ity,  and  asynchronous,  and  thus  span  a  large  range  of  possible 
soludons.  This  paper  reviews  these  architectures  and 
discusses  some  imptxtant  issues  in  choos'mg  an  architecture 
compadble  with  superconducdve  digital  circuit  technology. 


I.  INTRODUCTION 

There  are  many  types  of  superconducdve  digital  logic 
circuits  available  for  use  in  the  design  of  high-speed  micropro¬ 
cessors.  However,  this  is  not  sufficient  to  create  a  faster  com¬ 
puting  system.  In  parallel  to  the  development  of  the  circuit 
technology,  it  is  necessary  for  system  architecture  to  be 
develop^  as  well.  In  tlus  paper,  we  present  several  computer 
architecture  issues  which  have  been  explored  for  semiconduc¬ 
tor  digital  circuits,  and  then  examine  how  they  affect 
mimoprocessors  implemented  in  superconductor  technologies. 

We  will  first  present  results  on  instrucdon  use  in  general 
purpose  computers  and  some  techniques  used  in  conventional 
high-perfonnance  computer  architectures.  In  Section  m,  we 
discuss  syiKhronous  and  asynchronous  computer  architec¬ 
tures.  Section  IV  describes  three  representative  superconduc¬ 
tor  circuit  technologies  and  the  architecture  pressed  for  each 
of  them.  Finally,  we  examine  superconductive  digital  systems 
in  Section  V. 


n.  BACKGROUND 

The  t^timization  of  digital  systems  generally  includes 
cost  and  performance.  For  our  purposes,  we  will  ctmsider  cost 
as  being  the  number  of  gates  or  area  av^ble  for  the  chip.  To 
get  the  maximum  performance  for  a  given  cost,  careful  study 
of  die  tasks  to  be  performed  by  the  system  must  be  done. 
These  well  known  concepts  form  the  basis  fw  the  design  of 
RISC  (Reduced  Ihstructimi  Set  Computer)  microprocessors. 
This  section  will  jnovide  a  brief  overview  of  the  techniques 
used  to  improve  p^oimance  in  computer  architecture  to^y. 
It  is  not  intended  to  be  complete,  but  rather  to  establish  an 


informed  base  from  which  we  can  determine  appropriae 
forms  for  superconductor  microprocessor  architectures. 

A.  Instruction  Use 

Many  studies  on  instruction  usage  have  been  done  for 
general  purpose  computers.  Table  1  shows  the  frequency  of 
occurrence  and  time  spent  in  execution  for  five  classes  of 
instructions.  The  statistics  are  from  a  multi  user  load  on  1 
VAX  11/780  [1].  The  time  spent  in  execution  depends  on 
both  the  implementation  and  the  architecture  of  the  machine, 
and  would  be  different  for  other  computers.  However,  the  fie- 
quencies  of  occurrence  are  representative  for  these  types  of 
tasks.  This  will  be  rather  uniform  for  all  computers,  given  the 
same  suite  of  tasks.  There  are  also  specific  benchmarking 
programs  that  test  the  performance  of  a  variety  of  tasl;  types 
including  compilers,  simulators  (e.g.,  SPICE),  and  linear  sys¬ 
tem  solvers.  Therefore.  Table  1  is  not  complete  for  design 
piuposes.  but  it  does  show  the  general  distribution  of  instruc¬ 
tion  types,  and  is  quite  accurate  even  for  very  difrerent  tasks. 

Table  1.  Instruction  type  distribution 


Instruction 

Freouenev  f%) 

Time  f%)  1 

Move 

31.7 

43  ! 

Branch 

28.7 

18 

Simple  ALU 

19.8 

5 

Floating  Point 

10.9 

11 

Call/Retum 

8.9 

23 

The  most  striking  feature  of  Table  1  is  that  the  computer 
perftxms  a  move  operation  most  of  the  time.  In  fact,  complex 
arithmetic  represents  only  about  11%  of  the  instmetions  to  be 
executed.  Nearly  70%  of  the  instructions  depend  on  bi^' 
speed  memory  access,  which,  for  this  computer,  accounts  fet 
84%  of  the  execution  time.  In  addition,  there  are  typically 
fewer  than  nine  instructions  between  branches.  Thus,  memory 
access  is  the  limiting  element  in  the  design  of  high 
mance  computers.  These  findings  have  prompted  irmovativa 
solutions  in  conventitmal  computer  architecture  wlu^ 
improve  performance  dramatically. 


5.  Caching 


W‘ 
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To  improve  memory  access  time,  the  most  cotmntj^ 
solution  is  to  use  a  cache  memmy  [1].  A  cache  memory  g  *. 
high  speed  memory  used  to  store  a  cr^y  of  the  most  ' 
gently  accessed  memory  locations.  When  a  memory  ac^ 
is  requested  from  a  locaticm  that  is  cached,  the  data  is 
from  the  cache  memory  instead  of  main  memory.  Since  ca% 
memmy  is  much  faster  than  main  memory,  the  operation 
not  take  as  Icmg  as  it  would  have  without  the  cadm. 
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The  &|}eed  improvement  gained  by  using  a  cache 
depends  on  how  often  a  memory  accsss  request  is  in  a  cached 
locaCiaa  (the  hit  ratio)  and  the  access  time  of  both  the  cache 
gii  main  memory.  The  hit  ratio  trf  a  cache  depends  on  the 
(ask.  the  size  of  the  cache,  and  the  size  of  the  m^  memory, 
lha  success  of  this  technique  hinges  upon  the  fact  that  most 
oiemory  accesses  are  local  to  a  subset  of  the  total  memory. 

C  Pipelining 

Anodier  popular  method  of  improving  computer  perfor- 
niance  is  the  use  of  pipelining  [1].  Pipelining  allows  increased 
ddlization  of  hardware  resources  by  the  partial  execution  of 
mere  than  one  instruedon  at  the  same  time.  It  is  like  an  auto- 
Dcbile  ssembly  line,  where  each  worker  (hardware  resource) 
performs  a  task  on  the  car  (instruedon).  comp 'eting  only  part 
of  the  total  task.  Then,  while  the  car  is  being  worked  on  in  the 
next  station,  he  starts  to  work  on  the  next  car,  instead  of  wait¬ 
ing  for  the  entire  car  assembly  to  be  completed  by  all  the 
workers  before  continuing.  This  technique  can  cause  an 
increase  in  latency,  but  can  provide  much  higher  throughput 
(latency  is  the  time  required  to  execute  a  single  instruedon. 
while  throughput  is  the  number  of  instruedons  that  can  be  exe- 
cu!-:d  in  a  given  time  period).  One  of  the  most  common  uses 
of  pipelining  is  to  fetch  the  next  instruction  from  memory 
while  executing  tne  current  one. 

The  degree  of  pipelining  is  determined  by  the  increased 
cost  of  additional  pipeline  registers,  clocking  frequency,  and 
(hminishing  returns  due  to  condidonal  branch  instruedons. 
Condidonal  branches  require  that  the  operadon  upon  which 
the  branch  depends  be  completed  before  the  branch  target 
instructions  can  be  fetched.  When  these  instruedons  are  adja¬ 
cent.  a  "bubble"  is  aeated  in  the  pipeline  until  the  instruction 
is  completed  and  the  new  instruction  stream  can  be  started  in 
the  pipeline.  To  reduce  the  effect  of  conditional  branching  in 
pipelined  architectures,  various  branch  prediedon  or  delayed 
branching  techniques  have  been  investigated  [1]. 

D.  Parallelism 

Multiple  execution  units  can  also  be  used  to  improve 
perftxmance.  This  technique  uses  parallelism  to  execute  mul¬ 
tiple  instructions  ctmcunently.  The  use  of  multiple  execution 
tuits  requires  special  care  to  avoid  the  hazards  associated  with 
note  than  one  instruction  operating  on  the  same  data.  The 
Tomasulo  algtxithm  and  scoreboarding  are  two  schemes  that 
Kcomplish  this,  allowing  out-of-order  execution  and  max- 
UBnm  utilization  of  resources.  Simpler  schemes  restrict  the 
d^ree  of  parallelism  and  do  not  allow  out-of-order  execution. 

All  of  these  techniques  are  general  and  do  not  depend 
ni  technology.  This  maku  them  applicable  fca:  both  semicon- 
<bKtCT  and  superconductor  microprocessors.  However,  the 
^^^Bctiveness  of  each  scheme  is  tec^ology  dependent. 


m.  SYNC3IRONOUS  AND  ASYNCTIRONOUS 
ARDCTHTECrURES 

Although  most  computer  designs  are  synchronous. 
^Synchronous  dcirigns  have  been  proposed  and  small  systems 


have  been  made  using  this  approach  [2],  Here  we  discuss  the 
diHerences  between  these  two  architecrures.  How  they  relate 
to  superconductive  circuit  technologies  is  explained  in  Sec¬ 
tions  IV  and  V. 

A.  Synchronous  Architectures 

Synchronous  circuits  are  by  far  the  most  common  in 
digital  systems.  Ideally,  all  circuits  in  a  synchronous  design 
receive  a  common  input  simultu.ieously  This  input  signal  is 
designated  as  a  clock.  In  synchronous  design  rules,  no  ciremt 
is  allowed  to  change  the  behavior  of  the  clock.  This  architec¬ 
ture.  therefore,  assiunes  that  all  inputs  and  outputs  are  stable  at 
the  appropriate  times.  There  is  no  handshaking  circuitry;  all 
delays  must  be  calculated  by  the  designer  to  be  less  than  the 
clock  period,  with  allowances  made  for  clock  skew  and  circuit 
margins. 

High-speed  clock  distribution  is  a  critical  element  in 
synchronous  circuits.  Phase  and  frequency  information  must 
be  transmitted  to  all  the  circuits  in  the  system  simultaneously. 
At  high  frequencies,  the  distance  a  signal  travels  in  a  period  of 
the  clock  becomes  comparable  to  the  dimensions  of  the  circuit 
(X  =  7.5  mm  for  /  =  20  GHz).  To  meet  the  distribution 
requirements,  complex  clocking  schemes  must  be  devised, 
such  as  load-balanced  H-trees.  For  high  frequencies,  the 
clock  lines  must  be  treated  as  transmission  lines  as  well,  creat- 
mg  a  serious  impedance  matching  problem,  since  the  clock 
has  a  huge  fanout.  Distributing  phase  information  is  espe¬ 
cially  difScult  in  systems  with  multi-phase  clocking. 

B.  Asynchronous  Architectures 

In  an  asynchronous  circuit,  a  change  of  the  inputs 
directly  causes  a  change  of  the  outputs.  Thus,  all  combina¬ 
torial  circuits  are  by  definition  asynchronous.  These  circuits 
have  no  clock-,  timing  is  provided  by  the  logic  circuits  them¬ 
selves  through  the  use  cf  handshal^  signals.  To  maintain 
data  integrity,  the  handshaking  signals  must  guarantee  that  the 
input  data  are  stable  while  being  used  and  that  the  ouq?uts 
remain  stable  until  they  are  no  longer  needed.  A  complete  sip 
nal  can  be  incorporated  into  the  handshaking  signals  to  ensu 
correct  operation.  The  complete  signal  is  generated  by  tcc 
combinatorial  logic  circuits  and  is  asserted  after  the  output 
data  are  valid. 

Complex  bit-level  asynchronous  systems  are  not  practi¬ 
cal.  since  ^y  would  triple  bus  widths.  For  miac^rocessors 
which  use  64-bit  data  paths  and  nm  up  to  IS  busses,  this  is  not 
acceptable.  Thus,  practical  complex  asynchronous  systems 
synchronize  aU  data  bits  with  respect  to  other. 

The  structure  of  asynchronous  systems  can  be  much  like 
synchronous  ones;  combinatorial  logic  bounded  by  registers. 
The  difference  is  that  instead  of  using  a  global  external  clock, 
the  timing  signals  are  derived  frmn  the  handshaking  signals  in 
the  circuits  themselves.  The  handshaking  protocol  used  by  the 
interconnection  blocks  determines  the  degree  of  concurrency 
that  can  be  accomplished.  For  example,  a  full-handshake 
allows  both  computation  blocks  on  either  side  of  the  intercon- 
nectioo  block  to  operate  on  different  data  simultaneously.  A 
half-handshake,  on  the  other  hand,  allows  only  every  other 
logic  block  to  operate  on  data  concurrently. 
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A  combinadoD  of  asynchronous  circuits  can  be  used  to 
implement  a  synchronous  system.  All  conventional  comput¬ 
ing  systems  use  this  approach.  However,  a  synchronous  sys¬ 
tem  cannot  be  used  to  implement  an  asynchronous  one,  unless 
the  dock  frequency  used  is  much  hi^er  than  the  speed  at 
which  die  asynchronous  circuit  is  to  be  operated  (e.g..  asyn¬ 
chronous  data  transmission  between  different  cconputing  sys¬ 
tems  as  done  via  modem). 


IV.  THREE  REPRESENTATIVE  TECHNOLOGIES 

A.  Quantum  FluxParametron 

The  Quantum  Flux  Parametron  (QFP)  is  a  current- 
latching  logic  that  was  developed  by  Goto  et  al.  [3].  This 
logic  family,  wtuch  includes  a  so-called  D-gate,  requires  an 
external  clock,  which  also  provides  the  power  to  the  circuit. 
The  output  drive  of  the  circuit  is  insufficient  to  drive  the  clock 
input  to  other  similar  circuits.  This  mandates  that  the  clock  be 
an  externally  driven  input,  and  precludes  asynchronous  opera¬ 
tion.  Thus,  all  systems  designed  with  this  logic  family  must 
be  synchronous.  In  addition  to  this,  the  inputs  to  the  logic  gate 
must  be  valid  before  the  clock  input.  Therefore,  ripple  logic 
cannot  be  used,  making  the  system  synchronous  at  the  gate 
level.  Due  to  the  latching  nature  of  the  logic  and  its  clocking 
requirements.  QFP  circuits  are  pipelined  at  the  gate  level.  A 
thr^-phase  clock  is  necessary  to  guarantee  forward  propaga¬ 
tion  of  information. 

This  logic  family  can  compute  only  one  logic  level  in 
each  clock  period.  Thus,  it  requires  a  very  high  speed  clock  to 
utilize  the  speed  of  the  logic;  the  period  of  the  clock  must  be 
approximately  equal  to  a  single  gate  delay.  At  lugh  frequen¬ 
cies,  clock  skew  becomes  a  larger  fraction  of  the  clock  period, 
forcing  a  reduction  in  clock  frequency  to  maintain  worst-case 
circuit  margins. 

Since  it  is  pipelined  at  the  gate  level.  QFP  pipelines  tend 
to  be  very  deep,  potentially  reducing  their  effectiveness  due  to 
pipeline  bubbles  mentioned  in  Section  H.  To  address  this 
problem,  a  new  computer  architecture  called  the  Cyclic  Pipe¬ 
lined  Computer  (CPQ  was  designed  [3].  This  architecture  is 
designed  around  a  multi-tasking  operating  system.  To  avoid 
the  data  dependency  problems  which  cause  pipeline  bubbles,  a 
tadc  switch  is  done  at  each  clock  period.  This  eliminates  pipe¬ 
line  bubbles,  except  for  interdependent  tasks.  If  tasks  are 
allowed  to  share  memory,  a  technique  like  those  mentioned 
for  multiple  execution  units  in  Section  II  must  be  used.  For  a 
pipeline  n  deep,  this  architecture  is  like  an  n-way  parallel 
multi-processor  computing  system  running  at  lln  the  clock 
ficequency.  This  architecture  requires  n  register  files  and  pro¬ 
cessor  states  to  be  maintained  concurrently  and.  depending  on 
memory  access  time,  may  still  have  pipeli^  bubbles. 

B.  Mod^ed  Variable  Threshold  Logic 

Modified  Variable  Threshold  Logic  (MVTL)  [4]  was 
developed  at  Fujitsu  Laboratories.  Ltd.  and  is  a  voltage¬ 
latching  logic.  Like  the  QFP,  it  requires  externally  clocked 
power,  and  can  therefore  only  implement  synchronous  sys¬ 
tems.  However,  unlike  the  previous  logic  family,  the  inputs 
do  not  have  to  be  stable  before  the  clock  input.  Instead.  &ey 


are  allowed  only  "0"  to  "1”  transitions  after  the  clock  ha 
become  active.  This  allows  a  conventional  "dynamic 
CMOS-like"  design.  Inversion  can  only  take  place  at  aclort 
edge,  making  dual-rail  sigtials  necessary  for  ripple  1^ 
Smce  this  logic  family  is  voltage-latching  with  lygi 
impedance  loads.  RC  time  constants  dominate  and  pu^. 
chthrough  can  occur  with  high  clock  frequencies. 

The  similarity  of  MVTL  to  dynamic-CMOS  allows  the 
me  of  conventional  computer  architectures  with  minor  varia. 
tions.  Since  ripple  logic  is  allowed,  very  high  frequency 
clocks  are  not  necessary,  and  in  fact  cannot  be  med  because 
of  punchthrough.  This  reduces  the  microwave  power  distribu¬ 
tion  problem,  but  also  reduces  the  potential  throughput  of  the 
system  by  reducmg  the  maximum  degree  of  pipelining. 

C.  Rapid  Single  FIilx  Quantum  Logic 

Rapid  Single  Rux  Quantum  Logic  (RSFQ)  develc^ 
by  Likharev  et  al.  [5]  is  a  pulse-based  logic.  Timing  signals 
are  used  to  create  a  timing  window  in  which  the  the  arrival  of 
a  flux  quantum  <I>o  is  interpreted  as  a  logic  ”1",  and  no  arrival 
as  a  "0".  All  external  biases  are  dc.  Timing  signals  are  also 
single  flux  quanta,  allowing  the  circuits  to  drive  their  own 
clocks.  This  allows  RSFQ  circuits  to  implement  both  syn¬ 
chronous  and  asynchronous  systems. 

Signal  propagation  is  accomplished  through  biased 
Josephson  transmission  lines  (JTL)  and  by  microstrip  super¬ 
conducting  transmission  lines.  JTL’s  provide  isolation  and 
amplification,  but  require  more  area  and  power.  Microsirip 
transmission  lines  have  high  impedance  compared  to  a 
Josephson  junction,  and  pose  a  termination  problem.  Since 
the  signals  of  interest  are  picosecond  pulses  of  only  about 
lO"**  J,  maximum  energy  transfer  is  crucial  for  large  margins, 
and  reflections  both  reduce  the  transferred  energy  and,  without 
an  isolation  buffer,  can  cause  errors  in  the  driving  circuit 
Also,  transmission  lines  with  sharp  comers  can  act  as  radiat¬ 
ing  antennas,  potentially  losing  the  picosecond  pulse  signal 
entirely,  and  causing  severe  cross-talk. 

The  proposed  RSFQ  handshake  circuit  does  not  depend 
on  the  data  being  transferred,  and  does  not  produce  a  complelt,. 
signal.  Therefore,  although  the  protocol  is  time-independent, 
its  implementation  is  not  hazard-free:  the  input  data  may  not 
be  valid  when  the  timing  pulse  arrives  at  the  next  logic  blo^, 
Therefore,  while  the  timing  windows  proposed  in  RSFQ  may 
be  sufflcient  for  some  forms  of  asynchronous  circuits,  they  do 
not  guarantee  correct  operation. 

A  more  robust  implementation  is  necessary  for  a 
eral  interconnection  circuit.  The  key  is  to  provide  a  complett^^ 
signal.  This  may  be  accomplished  by  redefining  the  "0"  value.: 
One  solution  is  to  use  a  data  encoding  scheme,  wherein  tbe_ 
logic  levels  "0"  and  "1"  are  encoded  onto  two  separate  Un^-, 
Dual-rail  logic  is  sufficient  for  this  purpose.  This  encoding 
can  also  represent  the  timing  signal  request  by  a  simple 
the  two  data  signals.  The  OR  gate  can  be  implemented 
simple  confluence  buffer  in  RSFQ.  since  it  does  not  requh®^ 
additional  timing  signals.  In  the  case  where  there  are  cmoy 
data  lines  in  a  bus.  inverters  can  be  used  to  create  both  po^V. 
ties  of  the  output  for  generation  of  the  complete  signal,  wlul^ 
only  one  polarity  of  the  data  is  actually  transmitted  to  reduij 
the  number  of  interconnections. 


51 


2719 


V.  SUPERCONDUCTIVE  DIGITAL  SYSTEMS 

General  purpose  computers  depend  heavily  on  random 
access  memory  (RAM),  as  ^own  in  the  instruction  usage  stu¬ 
dies.  A  fast  cache  memory  must  be  available  for  data,  instiuc- 
ticcs.  and  some  branch  prediction  algorithms.  High  speed 
memory  is  also  necessary  for  register  files,  often  containing  up 
to  32  integer  registers  of  32  bits  each  in  addition  to  16  floating 
point  roisters  of  64  bits  each. 

Superconductive  memories  have  typically  been  inade¬ 
quate  to  the  demands  of  general  purpose  computing.  How¬ 
ever,  the  proposed  Josephson  jimction/CMOS  hybrid 
memories  are  prrxnising  for  a  high-speed  low-power  large 
main  RAM.  It  is  likely,  however,  that  faster  caches  will  still 
be  necessary  to  provide  the  required  memory  performance  and 
fully  utilize  a  superconducting  central  processing  unit  in  a 
cranputer  system. 

Hybrid  architectures  can  also  be  used.  The  system  may 
be  synchronous  at  the  highest  levels,  but  include  asynchronous 
blodcs  of  logic  to  improve  performance  of  certain  operations. 

Digital  signal  processing  (DSP)  covers  a  wide  range  of 
digital  systems.  DSP  can  require  flow  control,  arithmetic, 
memory,  and  programmability.  Thus,  it  has  all  the  com¬ 
ponents  of  a  general  purpose  computer.  In  fact,  many  DSP 
algorithms  are  run  on  general  purpose  computers.  However. 
DSP  can  also  be  done  on  data-flow  architectures  and  systolic 
airays.  providing  a  full  range  of  architectures  from  the  most 
programmable  (general  purpose  computers)  to  fixed  data  flow. 

DSP  microprocessors  usually  operate  on  data  screams. 
Thus.  RAM  can  be  replaced  with  a  more  dedicated  memory 
structure,  such  as  a  shift  register.  Memory  in  DSP  can  also 
often  be  traded  off  with  I/O  bandwidth.  This  makes  DSP  sys¬ 
tems  attractive  for  implementation  in  superconductive  elec¬ 
tronics.  which  has  poor  memory  capability. 

Most  current  DSP  architectures  support 
muldply/accumulate  operations,  although  multiply/(max,  min) 
operations  are  useful  as  well.  Applications  where  the 
muldply/accumulate  operadon  is  used  are  convolution  and 
finite  impulse  response  (FIR)  and  infinite  impulse  response 
(DR)  digital  filtering.  The  algorithms  can  be  adjusted  to  place 
the  delays  used  in  the  convoludon  sum  in  different  parts  of  the 
hardware.  Thus,  the  algorithms  can  be  tailored  to  fit  the 
tttperconductor  technology. 

VI.  SUMMARY 

We  have  presented  a  brief  overview  of  computer  archi¬ 
tecture  issues  and  considered  three  different  superconducting 
technologies  and  the  cmnputer  architectures  wluch  are  suit- 
ebletoeach. 

The  QFP  logic  family  requires  a  very  high-speed  exter- 
nal  clock,  and  must  be  synchronous.  The  system  is  also  pipe¬ 
lined  at  the  gate  level.  The  deep  pipeline  and  high  clock  speed 
necessary  make  full  udlizadcm  of  this  logic  family  difficult  for 
most  digital  systems.  However,  the  QFP  is  a  very  sensitive 
^nmparator.  a^  may  play  an  important  role  in  a  hybrid  sys¬ 
tem. 

MVTL  is  a  mature  logic  family  with  large  margins,  but 
"fruires  a  synchronous  system.  Since  very  Ugh-speed  syn- 
chronous  systems  are  limited  by  clock  distribudon.  the  addi- 


donal  constraint  imposed  by  punchthrough  for  MVTL  may  not 
be  the  deciding  factor  in  clock  speed  determinadon.  Ripple 
logic  capability  allows  minimum  latency  regardless  of  the 
clock  frequency.  While  MVTL  will  be  limited  in  the  depth  of 
its  pipeline  (as  will  all  clock-skew-limited  synchronous  sys¬ 
tems).  for  general  purpose  computing,  this  will  not  be  a  major 
issue  due  to  the  dimini.<ihing  returns  associated  with  deeper 
pipelines. 

Asynchronous  computer  architecture  has  the  most 
potendal  for  high  performance.  When  clock  distribudon 
makes  high-speed  synchronous  systems  impossible,  an  asyn¬ 
chronous  soludon  is  the  logical  choice.  RSFQ  logic  is  compa¬ 
tible  with  both  synchronous  and  asynchronous  architectures, 
but  more  suited  to  the  latter.  Asynchronous  circuits  imple¬ 
mented  in  RSFQ  logic  have  potentially  very  high  throughput 
When  designing  asynchronous  systems  in  RSFQ.  care  must  be 
taken  with  handshaking  signals  and  pulse  transmission. 

General  purpose  computing  is  unlikely  in  the  near 
future  for  systems  composed  entirely  of  superconductive  com¬ 
ponents  due  to  RAM  and  cache  requirements.  Hybrid  techno¬ 
logies  of  Josephson  junctions  and  CMOS  may  alleviate  some 
of  the  memory  problems. 

In  digital  signal  processing,  random  access  memory  can 
be  replaced  by  more  dedicated  memory  structures.  In  addi¬ 
tion.  a  data  flow  architecture  with  high  levels  of  pipelining  can 
be  selected  to  maximize  throughput.  With  these  choices.  DSP 
is  the  most  likely  prospect  for  implementation  with  supercon¬ 
ducting  digital  electronics. 
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An  Efficient  Method  for  Finding  dc  Solutions 
for  Josephson  Circuits 
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Abstract— A  dc  solutioo  program  is  very  useful  tor  finding 
operating  points  and  dc  transfer  characteristic  curves  of  Joseph¬ 
son  circuits  In  the  superconducting  state.  In  this  paper,  we  will 
discuss  the  formulation  of  Josephson  circuit  equations  in  the  dc 
state  and  propose  a  mixed-mode  approach  that  combines  the 
nonlinear  solution  method  of  source-stepping  and  time-domain 
method  of  numerical  integration.  Josephson  circuit  equations 
are  often  multivalued,  which  implies  the  existence  of  multiple 
solutions.  When  the  paths  taken  by  the  independent  sources  are 
specified,  only  one  of  the  many  possible  solutions  can  be  physi¬ 
cs.  The  mixed-mode  algorithm  follows  the  paths  of  the  inde¬ 
pendent  sources,  detects  ill-conditioned  points,  and  converges  to 
stable  points  on  the  characteristic  curves  of  the  simulated  cir¬ 
cuit.  The  algorithm  has  been  implemented,  and  case  studies  are 
presented.  The  method  and  techniques  presented  are  suitable  for 
implementing  dc  analysis  options  in  a  general  circuit  simulator. 


I.  Introduction 

DC  ANALYSIS  is  an  integral  part  of  circuit  simulation. 

Although  any  dc  analysts  problem  can  be  treated  as  a 
single  or  multiple  transient  problem,  the  cost  can  be  very 
high.  For  this  reason,  computer-aided  design  (CAD)  pro¬ 
grams,  like  the  SPICE  program  [1],  have  dc  analysis  options 
such  as  dc  operating  point  and  dc  transfer  curve  as  essential 
features.  The  work  presented  in  this  anicle  is  a  continuation 
of  the  effort  on  the  Josephson  circuit  simulator  (JSIM)  [2]. 
The  goal  of  JSIM  is  efficient  and  fast  circuit  simulation  for 
Josephson  circuit  applications,  especially  for  large  circuits. 
JSIM  has  achieved  an  order  of  magnitude  speed  improvement 
over  JSPICE2  [3]  in  simulation  of  medium-sized  circuits.  All 
existing  Josephson  circuit  simulators,  including  JSIM,  can 
only  simulate  transient  circuit  behavior;  dc  analysis  is  not  an 
allowable  option  in  these  programs.  The  SPICE  program  is 
often  the  basic  platform  for  Josephson  circuit  simulation 
programs  [3]-[S],  but  unfortunately  the  dc  analysis  methods 
in  SPICE  cannot  be  adapted  to  Josephson  circuits. 

SPICE  uses  the  Newton-Raphson  iteration  method  to  solve 
systems  ot  nonlinear  equations.  Every  iteration  method  re¬ 
quires  an  initial  guess  of  the  solution  to  start  the  iteration 
process.  The  guessed  solution  has  to  be  close  enough  to  the 
true  solution  for  the  iteration  to  converge.  In  semiconductor 
circuits,  the  main  nonlinearity  is  the  exponential  relation  of 
voltage  to  current.  A  typical  example  is  the  diode  equation  in 
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which  the  diode  current  is  an  exponential  function  of  voltage. 
The  main  difficulty  encountered  in  solving  equations  of  expo¬ 
nential  functions  is  numerical  overflow.  Two  methods,  limit¬ 
ing  and  source  stepping,  have  been  used  to  overcome  this 
problem.  The  simple  limiting  method  used  in  SPICE  restricts 
the  voltage  change  across  a  diode-like  device  from  one 
iteration  to  the  next.  The  source-stepping  method  [6]  is  a 
more  general  approach  to  finding  a  starting  point  but  has  a 
greater  computational  cost  than  the  limiting  method. 

In  the  source-stepping  method,  we  want  to  find  the  un¬ 
known  vector  X  such  that  the  function  F(x)  ~  y  is  satisfied. 
The  variable  y  is  the  source  vector,  the  value  of  which  is 
known.  We  can  parametrize  x  and  y  by  letting  y  =  >(5), 
then  X  =  x(s).  We  then  discretize  s  and  solve  for  x  at  each 
discrete  point  using  an  iteration  method  such  as  the  Newton 
method.  The  initial  guess  of  x  can  be  extrapolated  from 
previous  points;  the  simplest  extrapolation  would  be  using  the 
value  of  X  at  the  last  point.  This  parametrization  permits 
solution  of  path-dependent  problems,  that  is,  the  solution  is 
dependent  on  the  path  taken  by  the  source  vector. 

In  Josephson  superconductive  circuits,  the  nonlinearity  is 
nearly  sinusoidal.  The  circuit  equations  are  often  multival¬ 
ued,  and  therefore,  the  solution  is  path  dependent.  The 
source-stepping  method  is  a  natural  approach  but  is  not 
fail-safe.  As  pointed  out  above,  the  convergence  of  an  itera¬ 
tion  method  depends  on  the  difference  between  the  initial 
guess  and  the  true  solution.  Using  the  source-stepping  method, 
the  initial  guess  can  be  arbitrarily  close  to  the  true  solution  by 
using  an  arbitrarily  small  grid  in  s,  provided  that  the  solution 
vector  X  with  respect  to  the  source  vector  y  is  continuous.  In 
equations  involving  sinusoidal  functions,  this  continuity  re¬ 
quirement  is  often  not  satisfied,  and  source  stepping  may  not 
give  the  correct  solution  beyond  the  discontinuity  or  ill-condi¬ 
tioned  point.  In  addition,  the  source  vector  may  enter  a 
region  where  no  dc  solution  exists  (referred  to  as  the  voltage 
state  in  Josephson  circuits);  a  dc  solution  method  must  be 
able  to  detect  the  crossing  of  such  a  boundary.  In  this  paper, 
we  discuss  the  Josephson  circuit  equation  formulation  for  dc 
solutions,  and  present  an  efficient  method  for  dc  analysis, 
namely  dc  operating  point  and  dc  transfer  curve  calculations. 
Also,  techniques  will  be  presented  to  treat  ill-conditioned 
points,  and  to  detect  the  crossing  into  the  voltage  state. 

n.  The  SQUID  Threshold  Problem  and  General  dc 
Analysis 

The  majority  of  the  works  on  dc  analysis  of  Josephson 
circuits  have  concentrated  on  finding  threshold  curves  of 
multi-junction  superconductive  quantum  interference  devices 
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(SQUID’S).  The  threshold  curve  is  the  locus  of  ill-condi¬ 
tioned  points  and  voltage-state  boundary  points,  which  are 
extremum  points.  The  ill-conditioned  points  are  local  max¬ 
ima,  and  voltage-state  boundary  points  are  global  maxima. 
Circuit  equations  are  based  on  the  Kirkchoff  current  law 
(KCL),  and  constraint  equations  are  derived  to  determine  the 
extremum  points.  Tsang  and  Van  Duzer  [7],  Landman  [8], 
and  Peterson  and  Hamilton  [9]  use  the  Lagrangian  multiplier 
method  for  determining  the  extremum  points,  while  Schulz- 
DuBois  and  Wolf  [10]  uses  the  Gibbs  free  energy.  To  find  the 
threshold  curve,  Tsang  and  Van  Duzer,  as  well  as  Peterson 
and  Hamilton,  search  in  phase  space  to  find  solutions.  Land- 
man  and  Schulz-DuBois  employ  curve  tracing  techniques  by 
first  finding  a  starting  point  on  one  lobe  of  the  threshold 
curve  and  then  tracing  out  the  rest  of  the  lobe.  The  curve¬ 
tracing  techniques  is  very  similar  to  the  source-stepping 
method;  they  are  both  based  on  the  mathematical  technique 
of  parametrization  and  differentiation  of  a  smooth  continuous 
function.  The  starting  points  are  found  by  searching  in  phase 
space  and  using  prior  knowledge  of  the  SQUID  under  calcu¬ 
lation. 

The  topic  of  SQUID  behavior  is  important  in  Josephson 
circuits.  The  work  mentioned  above  has  provided  adequate 
ways  to  understand  multi-junction  SQUID  behavior.  The 
SQUID  threshold  curve  problem  is  not  the  primary  concern 
of  this  paper.  Determination  of  dc  operating  points  and  dc 
transfer  curves  requires  us  to  calculate  the  values  of  branch 
currents  and  junction  phases  with  a  given  set  of  source 
vectors  for  the  circuit  under  simulation,  which  may  be  very 
large.  This  is  a  different  problem  from  determination  of 
SQUID  thresholds.  Furthermore,  the  techniques  of  searching 
in  phase  space  for  possible  solutions  cannot  be  applied  to 
large  circuits,  because  the  computational  cost  typically  in¬ 
creases  exponentially  with  the  dimension  of  the  space  to  be 
searched,  which  is  proportional  to  the  number  of  junctions. 

The  dc  state  of  a  Josephson  circuit  is  completely  deter¬ 
mined  if  all  circuit  branch  currents  and  junction  phases  are 
known;  therefore,  a  dc  solution  method  for  Josephson  cir¬ 
cuits  must  be  able  to  solve  the  problems  of  ill-conditioned 
points  and  crossing  into  the  voltage  state.  The  method  we 
present  here  can  be  adapted  to  find  SQUID  or  other  SQUID- 
like  device  threshold  curves.  A  brief  discussion  on  the  adap¬ 
tation  will  be  given. 


Circuit  Symbol  Circuit  Model 

Fig.  I .  Circuit  symbol  and  model  of  a  Josephson  junction. 


Fig.  2.  Quasi-static  I-V  curve  of  a  Josephson  junction,  with  a  load  line. 


junction  becomes  nonzero;  the  time-average  voltage  is  the 
intersection  point  of  the  load  line  with  the  1-V  curve.  When 
the  junction  is  in  the  voltage  state,  there  is  no  dc  solution 
since  the  variables  are  oscillating.  However,  there  can  be  a 
dc  zcro-voluge  state  solution  if  the  supply  current  is  less  than 
the  junction  critical  current.  Furthermore,  if  the  junction  is 
shunted  with  an  inductor  to  form  a  one-junction  SQUID,  the 
steady-state  voltage  across  the  junctions  always  will  be  zero. 
In  dc  analysis,  the  junction  capacitance  and  nonlinear  resis¬ 
tance  are  not  included,  since  they  affect  only  transient  behav¬ 
ior. 


rv.  Equation  Formulation  Using  Nodal  Analysis 
Method 


in.  Joseph  Devices  in  the  dc  State 

The  Josephson  junction  is  modeled  by  the  Josephson  ele¬ 
ment  in  parallel  with  a  capacitor  and  a  nonlinear  resistor  as 
shown  in  Fig.  1.  The  Josephson  element  is  described  by  the 
Josephson  equations. 


/  =  /,.  sin  0 
t/</>  _  It 


(3.1) 

(3.2) 


where  ^  is  the  element’s  phase,  and  $ o  =  2.07  x  10~  Wb 
is  called  the  flux  quantum.  The  quasi-static  I-V  curve  of  a 
typical  tunnel  junction  is  shown  in  Fig.  2.  When  the  critical 
current  4  of  a  junction  is  exceeded,  the  voltage  across  the 


In  Josephson  technology,  there  is  a  class  of  zero-voltage 
state  logic  circuits  that  consist  solely  of  junctions  and  induc¬ 
tors.  Knowing  the  dc  operating  point  and  dc  transfer  curve  of 
such  a  circuit  is  essential  in  the  design  process.  If  we  use 
nodal  analysis  and  write  down  KCL  equations  at  each  note, 
we  get,  in  general,  a  system  of  nonlinear  equations  of  the 
form 

F{x)=y  (4.1) 

where  x  =  [  •  •  •  •  •  •  j’"  and  y  ^  •  yt,  •  •  V-  The 

variable  x,  is  the  unknown  variable  at  node  /,  and  y,  is  the 
source  variable  at  node  /.  In  order  to  use  the  Newton  method 
to  find  the  solution,  (4.1)  must  be  linearized  to 

/(x*)x*‘^'  =  y  -  F'(x*)  -I-  /(x*)x*  (4.2) 
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where  X*  is  the  value  of  x  at  the  A:th  iteration  step,  and 
y(x*),  where  =  dF/  jbXj,  is  the  Jacobian  of  F  evaluated 
at  the  A:th  step.  To  find  we  only  need  to  solve  a 

system  of  simultaneous  equations  of  the  form 

idx*"*"'  =  b  (4.3) 

with  A  =  y(x*)  and  b  -  y  —  F(x*)  +  y(x*)x*. 

For  our  equation  formulation,  we  assume  there  are  only 
four  types  of  devices,  junctions,  inductors  (coupled  or  uncou¬ 
pled),  independent  current  sources,  and  flux-controlled  cur¬ 
rent  sources.  If  we  let  e,  be  the  node  voluge  referenced  to 
ground  at  node  /  and  define  Xj  =  j  eidt/^Q,  which  makes 
X,  a  normalized  flux  variable,  we  can  write  down  the  entries 
to  (4.1)  and  (4.3)  by  inspection  using  the  nodal  analysis 
tetnplate.9  given  in  the  Appendix  as  a  reference.  The  inspec¬ 
tion  meth^  is  identical  to  that  discussed  in  standard  CAD 
textbooks  (see,  for  example,  Chua  and  Lin  [11]),  with  resis¬ 
tors  replaced  by  inductors  and  voltage  replaced  by  normal¬ 
ized  flux.  Since  there  are  no  mutually  coupled  resistors,  the 
case  of  mutually  coupled  inductors  must  be  treated  here. 

Shown  in  Fig.  3(a)  is  a  pair  of  mutually  coupled  inductors 
with  primary  inductance  L^,  secondary  inductance  L^,  and 
mutual  inductance  M.  The  relation  between  flux  and  current 
is  given  by 

^o^x^  =  L^i^  +  Mi,  (4.4) 

Mip  +  L,i,.  (4.5) 


To  write  the  current  variables  in  terms  of  the  flux  variables, 
(4.4)  and  (4.5)  must  be  inverted.  This  is  not  easy  if  we  want 
to  be  able  to  write  down  the  KCL  equations  by  inspection, 
especially  if  the  mutual  coupling  involves  more  than  two 
inductors.  Furthermore,  for  an  ideally  coupled  pair  (i.e., 
LpL,  =  M *),  the  relation  is  singular  and  not  invertible. 
Instep,  the  coupled  pair  is  replaced  by  an  equivalent  two-port 
network  of  uncoupled  inductors  and  flux-controlled  current 
sources,  as  shown  in  Fig.  3(b).  Derivation  of  the  parameter 
values  is  straightforward,  and  they  are  given  by 


^po 


^P 

1  -I-  a,  ’ 


L 


pb 


1  -t-  a, 


^  (1  +  a,)(l  -h  oj)  M  ^ 

O  =  •  ;  — Wq 


sa 


L. 

1  +  02  ’  ^  1  +«2  ' 


The  values  of  a,  and  02  can  be  chosen  arbitrarily;  the 
simplest  choice  is  a|  =  02  =  1.  Now  the  templates  for  the 
uncoupled  inductor  and  flux-controlled  current  source  can  be 
used  to  write  down  entries  to  the  KCL  equations.  The 
extension  to  coupled  triplets  can  be  done  easily. 


V.  The  One-Junction  SQUID  Problem 

The  circuit  shown  in  Fig.  4  is  a  one-junction  SQUID. 
Using  flie  formulation  described  earlier,  the  nodal  equation 
can  be  written  as 

/,sin2TX,  -1-  ~x,  =  /,.  (5.1) 


(a) 


Fig.  3.  (a)  A  mutually  coupled  pair,  (b)  An  equivalent  two-port  representa¬ 
tion. 


Fig.  4.  The  one-junction  SQUID  circuit. 


Fig.  5  is  a  plot  of  the  one-junction  SQUID  characteristic 
curve  with  =  1.  Starting  at  the  origin  O,  we  raise 

I,  past  to  Iq]  the  correct  operating  point  should  be  S|, 
even  though  points  Sj  and  S3  also  satisfy  (5.1).  The  point  Sj 
is  not  a  possible  operating  point  because  it  is  on  the  unstable 
CD  section  of  the  characteristic  curve.  To  get  to  S3,  I,  must 
be  raised  past  C  and  then  lowered  back  to  /q.  If  the 
source- stepping  method  is  used  in  conjunction  with  Newton 
iteration,  we  can  expect  to  encounter  two  difficulties.  The 
first  one  is  nonconvergence;  that  is,  the  stepping  cannot  pass 
point  A,  and  one  possible  scenario  is  illustrated  in  Fig.  6, 
which  shows  a  sequence  of  iterations.  Point  a  on  the  curve  is 
the  initial  guess,  the  iterated  solution  lands  on  point  d  after 
three  iterations.  The  iterations  oscillate  around  the  hump. 
The  iterated  solution  caimot  pass  much  beyond  the  hump  if 
the  iteration  is  carried  on  further,  while  the  true  solution  lies 
far  away.  The  second  problem  is  convergence  to  a  wrong 
solution.  The  slope  of  the  tangent  line  near  A  in  Fig.  S  is 
almost  horizontal,  so  the  intercept  of  the  tangent  with  the 
I,  =  /q  line  can  occur  at  almost  any  place  ^ong  the  line 
regardless  of  the  distance  between  Iq  and  /^.  If  the  method 
does  converge  to  a  solution,  it  could  be  any  one  of  the  three 
points.  This  is  a  worse  problem  than  nonconveigence  be¬ 
cause,  with  the  source-stepping  method,  we  cannot  easily 
distinguish  the  correct  solution  from  incorrect  ones.  The 
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Fig.  S.  Characteristic  curve  of  a  one-junction  SQUID,  where  = 


Fig.  6.  Possible  steps  of  a  nonconvergenl  Newton  iteration. 


discontinuity  in  x,  with  respect  to  I,  causes  the  first  prob¬ 
lem,  and  the  multivalued  nature  of  the  sinusoidal  function 
causes  the  second  one. 

These  problems  can  be  avoided  by  introducing  a  resistor  R 
between  node  1  and  ground.  As  soon  as  I,  exceeds  7^,  the 
operating  point  jumps  to  the  BC  section  of  the  curve  in  Fig. 
5.  Since  x,  is  diiecdy  proportional  to  the  current  in  the 
inductor,  and  current  in  an  inductor  cannot  change  instanta¬ 
neously,  th#  dynamics  during  the  transition  period  should  be 
described  by; 

/,sin2TX,  +  -jx,  -F  -jx,  =  I,  (5.2) 

where,  by  definition,  x,  =  voltage  at 

node  1.  Now  numerical  integration  can  be  used  to  solve  the 
problem;  the  dc  solution  for  X|  is  rea''hed  when  x,  ap¬ 
proaches  zero.  The  waveform  of  i:,  is  a  pulse,  which 
initially  rises  and  then  decays  to  zero.  Since  the  absolute  time 
dependence  of  Xj  is  of  little  concern,  the  value  of  R  can  be 
chosen  arbitrarily.  The  effective  incremental  inductance  seen 
the  resistor  ^  can  be  derived  by  taking  the  derivative  of 
(5.1)  with  respect  to  time,  and  is  found  to  be  ^g/ 
(2T7fCO$2irx,  ^o/L).  The  freedom  to  set  gives  us 


some  control  over  effective  time  constant,  which  we  will  use 
to  our  advantage  later. 

VI.  The  Pseudo-Time-Domain  Method 

We  observe  from  the  one-junction  SQUID  example  that 
traversing  past  an  ill-conditioned  point  requires  a  pseudo¬ 
time-domain  analysis.  We  call  it  a  pseudo-time-domain 
method  because  the  relation  of  flux  and  phase  to  time  is 
irrelevant.  The  method  can  be  extended  to  the  general  case  of 
(5.1)  by  introducing  a  resistor  to  ground  at  every  node.  The 
general  KCL  equation  becomes 

HAt))  +  =  y{,)  (6.1) 

where  C  is  a  normalized  diagonal  conductance  matrix, 
diag(C7)  =  [  •  •  •  ^q/R^,  •■■]  and  Rj  is  the  resistance  from 
node  i  to  ground.  Any  A-stable  numerical  integration  method, 
such  as  backward  Euler  or  trapezoidal  method,  can  be  used 
to  solve  the  equation.  Assuming  the  backward  Euler  method 
is  used,  (6.1)  at  time  step  n  +  1  with  a  time  increment  of 
^n+i  becomes 

(6.2) 

^n+l 

The  effective  incremental  time  constants  of  (6.1)  are  the 
eigenvalues  of  the  matrix  GJ~  '(x),  which  are  time  varying. 
The  differential  equation  is,  in  general,  stiff,  and  therefore, 
an  implicit  integration  method,  such  as  the  Backward  Euler, 
is  required  [12].  The  dc  solution  is  assumed  to  have  been 
reached  when  the  x  is  less  than  some  predefined  small  value. 

The  IBM  ASTAP  program  [13]  also  uses  a  time-domain 
approach,  called  pseudo-dc  scheme,  for  dc  analysis  of  semi¬ 
conductor  circuits.  To  find  the  dc  solution,  ASTAP  inserts  an 
inductor  in  series  with  every  voltage  source  and  a  capacitor 
in  parallel  with  every  current  source,  and  a  transient  analysis 
is  performed  similar  to  that  described  above.  It  is  called  a 
pseudo-dc  method  because  the  numerical  integration  error  is 
not  controlled  and  the  time-step  is  taken  as  large  as  possible, 
consistent  with  the  convergence  of  the  Newton  iteration. 
Convergence  to  a  dc  solution  is  guaranteed  if  the  integration 
method  is  stable. 

One  difference  between  our  time-domain  method  and  that 
of  ASTAP  is  that  the  matrix  G,  representing  the  .additional 
circuit  elements  introduced,  is  always  diagonal.  This  will 
neither  increase  the  size  of  nor  significantly  decrease  the 
sparsity  of  the  matrix  A,  where  A  ==  J  +  and  J  is 

the  Jacobian  of  F.  Rather,  A  can  be  made  diagonally 
dominant  and  hence  improve  numerical  stability  when  solv¬ 
ing  i4x  =  b.  Another  difference  is  that  the  integration  error 
has  to  be  controlled  to  a  certain  degree  in  our  situation 
because,  while  the  ASTAP  pseudo-dc  method  guarantees 
convergence  to  a  solution,  it  may  not  be  the  correct  solution. 
In  the  case  of  the  one-junction  SQUID  example  discussed 
above,  there  are  three  possible  solutions.  Only  one  of  them  is 
the  correct  one,  whereas  any  one  of  the  three  may  be  reached 
using  the  pseudo-dc  scheme.  In  transient  and  frequency  anal¬ 
ysis,  this  is  commonly  referred  to  as  numerical  aliasing.  In 
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the  transient  simulator  JSIM  [2],  this  problem  was  solved  by 
limiting  the  phase  change  of  each  Josephson  junction  from 
one  step  to  ^  next.  For  a  junction  between  node  i  +  and 
I  -  ,  the  phase  is  simply  given  by  =  2ir[3r,+-  so  if 
^,.[n  4-  1]  -  ^i[n]  is  more  than  t,  the  transient  time  step 
size  has  to  be  reduced.  The  computational  cost  of  the 
pseudo-time-domain  method  is,  in  general,  higher  than  for 
the  source-stepping  technique.  If  this  method  were  used  to 
find  N  points  on  a  dc  transfer  curve,  N  transient  analy¬ 
ses  must  be  performed,  and  the  cost  may  be  very  high  for 
large  N. 


vn.  The  Mixed-Mode  Method 

We  describe  here  a  cost-minimizing  procedure  that  com¬ 
bines  the  source-stepping  and  transient  analysis  methods.  It 
was  observed  above  that  the  source-stepping  method  works 
well  provided  that  the  solution  x  is  a  continuous  function  of 
the  source  y.  There  is  no  reason  to  use  a  transient  analysis 
except  near  an  ill-conditioned  point.  The  mixed-mode  method 
uses  source-stepping  until  a  possibly  ill-conditioned  point  is 
detected;  then  a  transient  analysis  is  performed  by  adding 
resistance  to  ground  at  each  node.  Upon  completion  of  the 
-time-domain  calculation,  the  program  reverts  back  to  source 
stepping  until  the  possibly  ill-conditioned  point  is  detected. 
Suppose  after  j  steps  of  the  source  stepping  process,  we  get 
X  =  for  >  =  yK  At  the  (y  -b  l)th  step,  (4.1)  is  solved  by 
iteration  using  (4.2);  both  are  restated  here  for  convenience: 

F(x-''+')  (7-1) 

y(x>+‘.*)x>+'-*->  =  -  F(x>+‘-*) 

+  y(xy+i.*)x^+''*.  (7.2) 

During  the  iteration  process,  we  may  find  either  of  two 
occurrences  that  indicate  possibly  ill-conditioned  points.  In 
one  case,  after  solving  (6.2)  some  prescribed  number  of 
times,  the  iteration  process  still  cannot  be  terminated.  This 
would  indicate  possible  nonconvergence,  which  usually  oc¬ 
curs  near  ill-conditioned  points.  Another  case  is  where 
I  JC./+ *•*•*■*  —  1,  which  is  the  change  from  one  itera¬ 

tion  step  to  the  next,  is  large.  When  either  situation  exists, 
we  switch  to  solving 

F(x^+‘(f))  +  (7.3) 

with  initial  condition  x-^''''(0)  =  x-^;  hence  X'^''’'(0)  = 
Q-\yJ-*-^  -  yJy  Assuming  that  Backward  Euler  integration 
and  Newton  iteration  are  used  to  solve  (7.3),  we  get 

>lx-'-^i'*-"i[/i  +  Ij  =  6  (7.4) 

where 

A  =  /(x-'+**[«  +  1])  +  t^G 

"rt  +  l 

b  =  -b  — ^ — Gx^*^[n\  -  F[x^*^-'‘\n  +  l]) 

"»i+i 

-b  /(x7+*'*[n  -b  l])x-'+‘  * 

and  x[/i]  =  If  we  pick  small  enough,  the  matrix 


A  is  guaranteed  to  be  nonsingular,  and  can  even  be  diagonal 
dominant.  The  transient  analysis  is  stopped  when  x-'^'  has 
decayed  to  less  than  e,  where  |  -  Ge  |  is  within  the 

error  tolerance. 


Vin.  Accelerating  the  Decay  of  x 

It  has  been  shown  that  the  source-stepping  method  may  fail 
at  points  where  the  solution  x  is  discontinuous  with  respect 
to  the  source  y.  The  mixed-mode  method  then  uses  a  time- 
domain  calculation  to  side-step  the  ill-conditioned  point.  It 
may  take  many  time  steps  for  the  transient  signals  to  decay  to 
within  error  tolerance  if  the  circuit’s  time  constants  cover  a 
wide  range  of  values.  The  time  constants  are  determined  by 
Li{t)/Ri  where  L,  is  the  effective  incremental  inductance 
seen  by  the  damping  resistor  Rj  introduced  at  node  i.  Since 
we  have  the  freedom  to  choose  Rj,  we  can  make  F,  time 
varying,  so  that  )  L,(t) )//?,(/)  is  the  same  at  each  node. 
The  absolute  sign  is  needed,  because  the  effective  incremen¬ 
tal  inductance  can  be  negative  with  active  devices  in  the 
circuit.  It  is  nontrivial  to  find  £,(/),  but  an  estimation  can  be 
made  with  minimum  additional  cost  in  computation. 

We  can  use  a  simple  model  of  the  decay  processes  for  node 
/  at  time 

«.('o)^,('o)  =/(jf,(fo))  (8.’) 

where  g,  =  io/Rj.  The  model  assumes  that  the  damping 
conductance  g,  sees  a  dynamic  inductance,  which  is  a  good 
approximation  for  a  Josephson  circuit.  It  is  an  approximation 
b^use  the  damping  conductances  at  other  nodes  are  ne¬ 
glected.  Keeping  g,  constant  from  f  «  to  f  =  fo  + 
we  have 

gi{>o)Xi{to  +  ^0  =f{Xi{fo  +  ^0)-  (8-2) 

Subtracting  (8.1)  from  (8.2), 


df 

“  -  Jf.Cfo)]-  (8.3) 

Replacing  with  /„  and  tt  with  then  the 

incremental  inductance  seen  at  node  i  during  this  period  can 
be  estimated  by 


*0 

L^[n  -b  1] 


^Mo)) 


Xf[«  +  1]  -i([n] 

x,[n-b  1]  -x,[n]  ■ 


(8.4) 


The  estimated  time  constant  on  the  interval  t„  to  is 
T^n  +  1],  where 

T/[n  +  l]  l]/*o-  (8.5) 

To  get  an  effective  desired  time  constant  of  r^,  we  simply  set 
giln  +  1]  =  I  +  1]  1 1  and  the  new  g,  is  used 

for  the  calculation  of  x,[«  +  2].  Equation  (7.4)  is  then 
modified  by  replacing  G  with  G[r].  The  additional  arith¬ 
metic  operations  are  insignificant  compared  to  the  number  of 
operations  required  to  solve  (7.4). 
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IX.  Detection  of  Deviation  from  Zero- Volt  age 
State 

A  criterion  is  needed  to  indicate  when  a  computation  is 
leaving  the  regime  of  source  values  for  which  dc  solutions 
exist.  A  simple  case  is  when  a  single  junction  is  driven  by  a 
current  exceeding  the  critical  current  1^;  there  cannot  be  a  dc 
solution.  Therefore,  the  Newton  iteration  in  source  stepping, 
after  the  source  has  been  stepped  above  /,,  will  not  con¬ 
verge,  and  the  transient  mode  is  entered.  The  criterion  used 
in  this  procedure  is  to  monitor  the  variable  i:.  If  it  does  not 
converge  to  less  than  a  preselected  value  t  within  some 
multiple  N  of  the  largest  of  the  average  time  constants  of  the 
circuit,  it  is  assumed  that  source  vector  y  has  left  the  regime 
of  dc  solutions.  Thus  in  the  transient  mode,  we  can  keep 
track  of  the  average  time  constant  f,  =  lE„ri[n]ft„]/7'  and 
let  ^en  the  integration  time  exceeds 

NTmtx.  if  X  is  still  much  more  than  t,  we  conclude  that  the 
circuit  is  in  voltage  state.  The  constant  N  is  picked  on  the 
basis  of  the  desired  resolution. 


Fig.  7.  The  one-junction  SQUID  ctunciemik  curve  with  7,  «  I  mA  end 
L  =  2.07  pH. 


X.  Adapting  to  Threshold  Curve  Calculation 

As  described  at  the  beginning,  the  threshold  curve  is  the 
locus  of  ill-conditioned  points  and  voltage-state  boundary 
points,  representing  local  and  global  maximum  points,  re¬ 
spectively.  Ill-conditioned  and  voltage-state  boundars.-  points 
are  detected  when  the  source-stepping  cannot  proceed  fur¬ 
ther.  An  adaptive  technique  of  picking  source  step-sizes 
should  be  used;  the  increment  and  decrement  steps  uken  by 
the  sources  should  be  reduced  when  a  possible  ill-conditioned 
point  is  approached.  The  accuracy  of  the  calculation  will  be 
determined  by  the  smallest  source  step,  which  should  be  set 
by  the  user.  The  sources  should  be  stepped  in  a  systematic 
way  to  stay  near  the  threshold  for  efficient  tracing  of  the 
threshold  curve.  The  mixed-mode  method  has  an  advantage 
over  the  Schulz-DuBois  continuation  method  [10]  in  that  it 
can  cross  ill-conditioned  points.  This  allows  the  operating 
point  to  move  from  one  threshold  lobe  to  a  neighboring  lobe, 
and  makes  a  systematic  search  for  all  lobes  much  easier.  A 
difficulty  in  applying  the  threshold  calculation  technique  to  a 
large  circuit  is  the  determination  of  when  and  if  the  entire 
threshold  curve  is  found,  but  this  is  true  for  all  threshold-curve 
calculation  methods. 


XI.  Case  Studies 

.  The  mixed-mode  method  with  adaptive  resistive  damping 
has  been  implemented.  We  present  two  simple  cases  here,  so 
that  the  rea^r  can  easily  implement  the  methods  discussed 
and  verify  die  result.  The  first  case  is  the  one-junction 
SQUID  circuit  discussed  in  previous  sections,  the  computer 
calculation  of  its  characterisdc  curve  is  shown  in  Fig.  7.  The 
curve  is  traced  out  with  the  source  first  raised  from  0  to  1.S 
mA,  then  lowered  past  0  to  -0.4  mA,  and  finally  brought 
back  to  zero.  The  computer  calculated  curve  matches  the 
characteristic  curve  shown  in  Fig.  S  without  the  unstable  part 
A-B.  For  the  one-junction  SQUID,  the  unstable  part,  which 
has  a  negative  slope,  can  be  determined  by  inspection  of  the 
curve,  but  this  is  not  true  for  many  other  cases,  one  of  which 


is  presented  next.  The  quantum  flux  parametron  (QFP)  cir¬ 
cuit  proposed  by  Harada  el  al.  [14]  is  shown  in  Fig.  8.  Half 
of  a  full  period  of  the  characteristic  curve  of  this  QFP, 
calculated  using  our  method,  is  shown  in  Fig.  9.  The  other 
half  of  the  period  is  just  a  symmetric  reflection  about  I,  = 
0.57  mA  of  the  one  shown.  The  isolr.ted  part  of  the  character¬ 
istic  curve  shown  in  [14]  is  of  elliptical  shape,  and  analysis 
can  be  done  to  show  that  the  top  half  of  the  ellipse  belongs  to 
the  unstable  part  of  the  characteristic  curve.  Our  mixed-mode 
method  finds  only  the  stable  solutions,  which  are  the  physi¬ 
cally  possible  dc  operating  points. 

Xn.  Summary 


We  have  discussed  a  formulation  for  obtaining  dc  solutions 
of  Josephson  circuits.  An  efficient  method  using  the  combina¬ 
tion  of  source  stepping  and  transient  calculation  with  resistive 
damping  is  presented.  Adaptive  resistive  damping  is  used  to 
equalize  the  time  constant  at  each  node  and  reduce  the 
computation  cost.  The  calculation  of  time  constants  also 
provides  a  way  to  differentiate  between  the  zero  and  nonzero 
voltage  states.  An  adaptation  of  the  method  to  SQUID  thresh¬ 
old  curve  calculation  is  also  discussed.  The  techniques  are 
suiuble  and  presented  in  sufficient  detail,  so  that  a  reader 
may  implement  it  as  part  of  a  general  simulation  program 
such  as  JSIM  or  SPICE. 

Apfenddc 
Device  Templates 


In  our  formulation,  current  is  always  assumed  to  fiow  from 
the  positive  node  to  the  negative  node,  and  the  direction  of 
leaving  a  node  is  considered  the  positive  current  direction  for 
that  node. 

Template  for  Uncoupled  Inductor:  For  an  uncoupled 
inductor  with  inductance  L  between  node  /  •(-  and  node  i  -  , 
the  contribution  of  current  at  the  two  nodes  are  given  by 


left  side 


*o/L 


-♦oAlfjf/-.] 

*JL 
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Fig.  8.  A  QFP design  by  Hands  tl at.  [MJ. 


^  0.100 

E 

-5 

B  0.050 


1 

j 

i 

j 

i 

j 

_ ^ 

J 

\ 

i 

; 

\ 

^  a 


Fig.  10.  Example  circuit. 

wh'Te  the  definition  6jr,  =  Xj^.-  x,_  has  been  used.  The 
template  for  the  linearized  equation  in  the  fonn  of  Ax**'  = 
b  is; 

2icLcos2x6xf  2TLcos2x6xf  Ifjr.V 
leftside  ‘  \  ‘  V 

-  2  »/,  cos  2  x6xf  -  2  a/,  cos  I  xSxf  J  [  xf* ' 

-7,sin2x6xf  +  2t6x,*/-cos2x6x,* 
right  side  .  .  .  . 

/,sin2T6x,  -2T6x,./jCOs2T6xf 

Example:  Using  the  templates,  we  can  write  down  the 
KCL  equations  for  the  circuit  shown  in  Fig.  lO  by  inspection. 
At  node  I : 


9 

Y(x,  -Xj)  +/„$in2xx,  =  -1,. 


At  node  2: 


gU'l— - r— 

0.000  0.100  0.200  0.300  0.«0  0.500  0.500 


Excitation  Cunent  /^(mA) 

Fig.  9.  11*  QFP  characteristic  curve  with  I,  >•  20  pA. 

Template  for  Independent  Current  Source:  For  an  inde¬ 
pendent  current  source  with  value  J,  between  node  i  -f  and 
1  -  : 

“‘I* is  • 

Template  for  Flux-Controlled  Current  Source:  For  a 
flux-controlled  current  source  with  control  nodes  at  I'c  +  and 
ic  -  ,  controlled  source  between  node  /  +  and  i  -  ,  and  a 
flux  transconductance  G: 

0  0  0  0 

leftside  ^  -0  0  0  */+ 

.-G  G  0  oJL*i-. 

Template  for  Josephson  Junction:  For  a  Josqihson  junc¬ 
tion  between  nodes  i  +  and  /  —  with  critical  current 


left  side 


/,sin2xix, 
iide  ,  .  -  a 

-/eSm2»6x, 


Y 

~{X2  -  Jf|)  +  ffj  sin2xX2  =  0. 

The  entries  to  matrices  A  and  h  of  (4.3)  are 

♦o/Z,  +  2x7e,cos2Txf  -*o/^ 

*0  +  2  »7rt  cos  2  xxj* 


-7,  -  7„  sin2xxf  +  2x7,,xf  cos2xxf 

b  —  .  ^  t 

-hi  sin2irX2  +  2»7,2X2  cos2tx, 
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Abstract— Wiih  the  continuous  increase  of  complexity  of 
supeieonducting  integrated  circuits,  the  demand  for  computer- 
ai^d-design  tools  is  rising.  Circuit  extraction  from  layout  to 
simulation  is  an  important  phase  of  an  IC  design.  It  verifies  the 
dicuit  design  by  identifying  circuit  devices,  checking  connec¬ 
tivity.  and  calculating  design  parameters.  This  paper  presents 
an  extract^'  INDEX,  designed  to  extract  superconducting  cir¬ 
cuits  from  layout.  The  inductances  of  the  superconducting  lines 
are  calculated  by  a  set  of  analytical  models.  These  self-  and 
mutual-inductance  models  are  generated  from  a  series  of 
numerical  simulations  and  a  linear  programming  curve-fitting. 
INDEX  is  based  on  the  MAGIC  layout  system. 

I.  INTRODUCTION 

A  circuit  extractor  has  two  phases.  The  first  is  called 
oetlist  extraction  in  which  devices  like  transistors  and  Joseph- 
son  junctions  are  identified  and  connected  regions  (called  nets) 
are  determined.  The  second  phase  is  called  parameter  extrac¬ 
tion  which  involves  calculating  electrical  parameters  of 
devices  and  nets,  such  as  the  width  and  the  length  ratio  of  a 
transistor  gate.  Josephson  junction  critical  cmrents  and  capaci¬ 
tance.  inductance,  and  resistance  of  the  net  The  uniqueness  in 
the  superconducting  IC  is  that  the  intercounection  does  not 
have  resistance  and  it  is  modeled  as  an  inductor  in  the  circuit. 
The  inductance  of  the  interconnection  affects  not  only  the  cir¬ 
cuit’s  normal  operation  but  also  its  operating  margins.  Thus,  it 
is  important  that  the  superconducting  extractor  accurately 
cxf  act  the  interconnection  inductance. 

There  are  three  ways  to  calculate  inductance,  resistance 
md  capacitance  in  an  extracts:  numerical  calculation  by  solv¬ 
ing  Laplace’s  equatitm  [1],  lumped-model  approximation  [2] 
and  analytical  modeling  [3].  The  numerical  solution  is  the  most 
accurate  one.  but  it  is  impractical  for  a  large  circuit  because  of 
its  high  computational  cost.  The  lumped  approximation 
method  is  the  fastest  one  but  it  is  the  least  accurate  because  it 
neglects  the  gemnetrical  detail  and  fringing  effects.  Analytical 
modeling  provides  a  good  trade-off  between  speed  and  accu- 
tacy  and  this  is  the  method  we  adopted  in  INDEX.  In  this 
method,  nets  are  first  decomposed  into  simple  rectangles  and 
nccording  to  their  positions,  different  inductance  models  are 
Spiled. 


This  woric  is  supported  by  the  U.S.  Air  Force  Contract  No.  F19628- 
50-X-0037  and  the  DoD  University  Research  Initiative.  Manuscript 
recMved  August  24. 1992. 


In  this  paper,  we  will  first  present  our  method  of  gener¬ 
ating  analytical  models  for  various  mductor  configuratirms 
based  on  numerical  simulation.  Then  we  discuss  issues 
involved  m  the  inductance  extractitxi:  current  direction  iden¬ 
tification,  self-  and  mutual-inductance  calculation,  network 
simplification,  and  extraction  results  and  speed.  Finally,  we 
point  out  possible  future  improvements. 

EL  INDUCT.ANCE  MODELING 

A.  Analytical  Modeling 

There  are  two  practical  approaches  to  build  models  of 
inductance  in  layout.  One  is  to  construct  analytical  models 
for  inductors  in  different  geometric  configurations  from 
experimental  measurements  or  numerical  calculations.  The 
other  is  to  form  a  look-up  table.  In  the  look-up  table 
approach,  the  memory  requirements  grow  very  rapidly  with 
the  increase  of  the  number  of  parameters  describing  a  given 
inductor  configuration  and  of  the  range  of  interest  for  each 
parameter,  even  though  sophisticated  interpolation  methods 
can  be  used  to  reduce  the  memory  storage  space.  A  more 
attractive  way  is  to  generate  analytical  models.  The  electro¬ 
magnetic  world  is  continuous  so  inductance  values  should 
vary  smoothly  with  the  change  of  inductor  geometries; 
hence,  a  compact  analytical  formula  can  be  fitted  to  a  wide 
range  of  numerical  or  experimental  data.  Furthermore,  ana¬ 
lytical  models  are  easy  to  interpret.  Circuit  designers  can 
gain  insight  into  the  change  of  model  value  with  the  change 
of  circuit  parameters. 

B.  Numerical  Simulation 

We  base  our  inductance  models  on  numerical  simulation 
because  it  is  convenient  to  generate  a  large  range  of  data  for 
various  geometrical  configurations  and  it  is  considerably 
faster  than  doing  experiments.  There  are  a  few  different 
numerical  algorithms  available  to  calculate  the  inductance  of 
superconducting  lines.  A  fairly  efficient  approach  is  to  use 
the  Lagrangian  variational  method  [4].  Even  though  it  is 
only  based  on  a  two-dimensional  numerical  algorithm,  its 
results  are  found  to  be  rather  close  to  the  experimental  data 
[5]  and  it  is  reasonably  fast  because  it  exports  the  inductance 
matrix  without  calculatiirg  the  current  distribution.  There  are 
old  FORTRAN  programs  based  on  this  algorithm,  but  in 
order  to  make  the  simulation  compatible  with  the  inductor 
model  generation  system,  we  implemented  it  in  the  C  lan¬ 
guage  and  improved  its  memory  allocatiorL 
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C.  Model  Fitting  and  Model  Generation 

Ftom  the  above  aunalatioa  results,  we  generate  iudiictance 
values  for  a  set  of  Haaign  parameters.  We  considered  the  fol¬ 
lowing  canfiguratiaas:  (1)  single  line  over  the  ground  plane. 
(2)  two  coiqiled  lines  on  the  same  level.  (3)  two  coupled  lines 
on  diffetent  levels.  In  case  (3),  depending  on  their  relative 
positions,  they  are  further  divided  into  subconfiguratioas.  (See 
Fig.l). 

hi  coofiguratioo.  the  form  of  the  analytical  model  is 
guided  by  physic^  considerations.  For  example,  for  a  single 
line  over  a  ground  plane,  the  inductance  gets  smaller  with  the 
increase  of  its  line  width.  So  the  fitting  formula  is  chosen  as  a 
polynomial  of  lAvidth.  Let  us  assume  that  the  calculated  val¬ 
ues  are  sOD^  for  a  set  of  parameters  Qc  =  1,...  M).wbereM 
is  die  number  of  die  fitting  data  points.  The  fitdng  model  is 

1-1 

where  the  {aj}  are  the  fitting  coefficienu. /.  is  a certait^^^  ac- 
don  of  parameters  and  N  is  the  number  of  coefficients.  So 
the  problem  of  optimizing  the  model  formula  can  be  stated  as 
minimizing  the  maximum  relative  error  which  is 
defined  as: 

^rmax 

Since  m(D^)  is  a  linear  fiincdou  of  coefficients  {a|).diis(2)  ibe 
considered  as  a  linear  programming  problem  with  variables 
{14}  and  ^  rninimize  A  standard  linear  {n’ogram- 
ming  package  based  <«  simplex  method  is  called  [fi]. 


We  combined  all  the  above  programs  into  a  progti-, 
called  INDMOD  to  automadcally  generate  various 
models.  INDMOD  reads  in  a  process  descripdon  file 
specifies  the  pnxress  parameters,  such  as  the  number  of  oeu 
layers,  their  thicknesses  and  the  separations  of  the  layers'^ 
ou4>uts  the  self-inductance  and  mutual-inductance  model  fg, 
earh  layer.  Various  models  for  the  Berkeley  niobium  proc^ 
have  been  developed  but  are  not  included  here  due  to  linuj^ 
space.  All  the  models  are  fitted  within  10%  of  the  simiiig^ 
vahK  and  many  are  more  accurate  than  this. 

m.  SUPERCONDUCTING  CIRflirTEJCTRACnON 

A.  MAGIC  layout  system 

MAGIC  is  an  appropriate  layout  tool  in  which  to  impk- 
TTiftnt  our  inductance  extraction  because  it  is  a  widely  used  ^ 
tern  which  has  interfaces  with  other  intermediate  layrui 
formats,  such  as  cif  and  calma.  MAGIC’S  comer-stitch  dau 
structure  also  makes  the  extraction  much  simpler.  In  MAGIC, 
polygons  are  represented  by  rectangles,  called  tiles.  Each  t£ 
has  four  pointers  to  four  neighbors.  (See  Fig.  2)  This  makes 
neighbor-related  operations  easy  to  implement.  For  example,  ii 
supports  condnuous  design-rule  checking  and  incremental  ui 
hierarchical  extracdon  [2]. 

MAGIC'S  extractor  can  extract  transistor  properdes,  and 
the  capacitance  and  resistance  of  the  nets.  These  are  usualis 
enough  to  cover  the  needs  of  semiconductor  circuit  design.  But 
the  original  extractor  only  extracts  the  lumped  capacitance  1^ 
resistance  of  a  net.  This  is  not  sufficient  for  a  supeiconductint 
circuit  netlist.  An  improvement  was  included  in  the  newest 
release  of  MAGIC  which  can  extract  a  detailed  RC  network'ol 
a  net  [7].  but  it  requires  a  preprocessing  of  the  original  exti^- 
tor.  In  our  approach,  we  implemented  INDEX  without  lequit- 
mg  a  lumped  model  pre-extracdon.  The  following  few  seetkos 
highlight  the  main  procedures  of  INDEX. 


czii  no 


B.  Circuit  Flattening  and  Resolving  Contacts  and  Junctions 


(3c)  ad) 


Fig.1  '  iDdnclorooiifiguiatioiis  analyzed.  The  shaded 
induction  are  the  ground  planes. 


Before  the  circuit  extraction  starts,  some  preprocessing  it 
deme.  In  the  current  version  of  INDEX,  a  hierarchical  circuit  k 
fiattened  out.  All  the  subcells  in  a  parent  cell  are  pushed 
their  parent  cell  level  recursively  and  the  overlaps  are  meig^ 
This  avoids  the  complexity  of  calculating  mutual  inductan^ 
cmipling  amtmg  a  parent  cell  and  its  subcells.  The  flattened  oel 
is  deleted  at  the  end  of  the  extraction. 


Fig.  2  The  conier-stilch  data  structure  used  in  MAGIC- 
Each  tile  has  four  pointers  panting  to  its  neighbor- 
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Next,  contacts  and  Josephson  junctions  are  identified, 
a  layer  contact  or  a  Josephson  junctioa  to  the  layer  is  a 
Qifient  source  or  drain  for  that  layer  and  can  change  Che  current 
ggnr  dramatically.  Each  die  is  mariced  as  having  a  point  with 
^  iufonnadon  of  die  posidon  of  the  contact  or  juncdon  if 
giere  is  a  contact  or  juncdon  on  it:  we  call  that  point  a  break 
ppjnt  Then  the  tiles  representing  contacts  are  removed  and 
complex  superconducting  layer  dies  around  contacts  are 
merged. 

C.  Rectangle  Decomposition 

B  ?fore  we  can  call  any  model  formula  to  calculate  an 
value,  the  current  direction  and  the  superconducting 
line  width  and  length  information  are  needed.  Ihe  basic  algo¬ 
rithm  breaks  a  complex  net  into  rectangles  and  calculates 
inductance  incUvidually  for  each  rectangle,  and  then  adds  them 
vp. 

The  MAGIC  system  represents  objects  in  ma.ximally  hori¬ 
zontally  merged  dies.  (See  Fig.  3a).  To  find  out  the  current 
(Erection,  tiles  have  to  be  rearraged.  First,  narrow  horizontal 
tiles  are  merged  into  their  neighbors.  (Fig.  3b).  The  second 
sorting  cuts  every  tile  which  has  a  neighbor  on  its  longer  side. 
W:  get  to  a  situation  in  Fig.  3c.  where  current  direction  in  a  tile 
can  be  best  determined  by  the  positions  of  its  neighbors  and  the 
position  of  the  break  points  it  has: 

1.  If  a  tile  has  no  neighbor  or  break  point,  then  it  is  a  float¬ 
ing  ncde.  Nothing  is  done  to  it. 


(a)  (b) 


2.  If  a  tile  has  one  neighbor  or  break  point,  it  is  also  a  float¬ 
ing  node.  But  in  some  cases,  it  is  an  external  terminal,  so  it  is 
processed  assuming  that  the  current  is  coming  from  the 
farthest  away  from  the  break  point  or  the  neighbor  position. 

3.  If  a  tile  has  two  neighbors,  two  break  points  or  one  of 
each,  then  the  current  must  be  flowing  between  them.  Depend¬ 
ing  on  their  relative  positions,  the  current  direction  is  catego¬ 
rized  as  horizontal,  vertical,  or  mixed.  For  example,  if  one 
neighbor  is  on  the  left  side  and  one  is  on  the  right  side,  the 
direction  is  horizontal.  If  one  is  on  the  above  and  one  is  on  the 
left,  the  direction  is  mixed.  For  a  mixed  inductor,  we  «s.q>me.  its 
inductance  is  a  constant  firaction  of  the  inductance  if  it  were 
vertical  or  horizontal.  This  constant  is  chosen  as  0.7  based  on 
the  experimental  experience. 

4.  If  a  tile  has  more  than  two  neighbors  or  break  points,  or 
their  combinations,  the  inductance  problem  is,  in  fact,  three- 
dimensional.  The  model  we  developed  from  two-dimensional 
simulation  will  be  inaccurate.  So  crude  approximations  are 
made  here.  The  extraction  is  more  accurate  if  those  complex 
tiles  (xJy  introduce  a  small  fraction  of  the  total  inductance.  We 
approximate  by  first  assuming  a  current  direction  based  on  the 
position  of  its  break  points  and  neighbors.  Then  their  p<3sitions 
are  either  sorted  horizontally  or  vertically  and  the  inductance 
rectangles  are  assigned  to  a  pair  of  close  break  points. 

Because  we  can  not  determine  a  tile’s  inductance  without 
finding  its  mutual  coupling  with  its  neighbors,  in  this  pass  each 
inductor  is  only  identified  and  is  associated  with  a  rectangle 
area  and  the  self-inductance  value  is  postponed  to  be  calculated 
in  the  mutual  coupling  module. 

D.  Self-  and  Mutual-Inductance  Calculation  and  Network  Sim¬ 
plification. 

Tiles  with  the  information  of  their  current  flows,  dimen¬ 
sions.  and  positions  of  the  inductors  are  now  checked  to  see  if 
there  is  any  mutual  coupling  among  them.  Different  supercon¬ 
ducting  layers  as  well  as  the  same  layer  are  searched,  but  if  the 
separation  of  two  inductors  exceeds  a  certain  limit,  the  cou¬ 
pling  is  neglected.  Coupling  is  also  neglected  if  two  inductors 
are  peipen^cular  to  each  other. 


WA  category  2 

^  categoo'  3 

pTI  category  4 


Hg.3  Rectangle  decomposition.  A  net.  which  has  a  shape 
in  (a),  is  processed  through  (b)  and  (c).  The  cross- 
hatching  in  (c)  indicates  die  inductor  categories  dis¬ 
cussed  in  the  text 


After  this  mutual  coupling  search,  all  the  inductors  or  parts 
of  them  are  categorized  into  difierent  coupling  models  as  in 
Hg.  1.  And  the  effects  are  added  up  if  one  inductor  has  two 
types  of  coupling. 

The  inductors  we  get  by  this  stage  form  a  very  complex 
dreuit  network  due  to  the  above-described  rectangle  decompo¬ 
sition  procedure.  The  network  is  greatly  amplified  in  this  mod¬ 
ule.  All  the  serial  and  parallel  inductors  are  merged  and  A- 
sbqied  networks  are  transformed  into  Y-shaped  networks. 

IV.  CASE  TESTS  AND  RESULTS 

INDEX  has  been  tested  on  a  number  of  cases  with  good 
results.  Like  the  original  extraction  tool  in  MAGIC.  INDEX 
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Hg.  4  Layout  of  a  two-junctlou  SQUID. 

first  produces  an  extracted  file  from  the  layout  containing 
information  about  the  Josephson  junctions,  inductors,  and 
resistors.  After  this,  a  tool  called  extljsim  is  used  to  transform 
the  circuit  from  the  extraction  format  to  the  JSIM  format.  After 
the  user  puts  in  the  junction  model  and  the  external  current  or 
voltage  sources,  JSBl  can  be  run  to  check  the  design.  In  ord'^r 
to  help  the  user  to  interpret  the  extracted  circuit,  the  extracu  d 
inductors  with  their  names  are  shown  on  the  layout  screen 

A  two-juncdon  SQUID  (Fig.  4)  is  used  as  one  test  case.  Its 
JSIM  output  deck  is  shown  in  Fig.  S.  and  its  schematic  diagram 
is  shown  in  Fig.  6.  We  car.  see  that  it  not  only  extracts  the  loop 
inductances  of  a  SQUID,  but  also  parasidcs.  The  hand  calcula- 
don  value  of  the  total  loop  inductance,  control  line  inductance 
and  mutual  inductance  are  7.0  pH.  8.0  pH  and  4.2  pH  respec- 
dvely,  which  are  very  close  to  the  extracted  value  6.7  pH,  7.9 
pH  and  3.9  pH. 

INDEX  was  also  tested  on  a  part  of  a  S-to-32  bit  serial 
decoder  [8].  It  extracted  38  juncdons,  2307  inductors  (before 
network  simplificadon),  345  inductors  (after  network  simplifi¬ 
cation)  and  81  resistors  in  5  seconds.  A  large  portion  of  the 
time  actually  is  spent  on  illustrating  inductors  on  the  layout 
screen.  This  shows  that  INDEX  is  sufficiendy  fast  for  the  cur¬ 
rent  level  of  supeiconducdng  circuit  design. 

V.  CONCLUSION 

In  this  paper,  we  present  a  superconducting  circuit  extrac- 
doD  tool  based  on  the  MAGIC  database.  It  can  extract  a  super- 

B1 4  6  jmod  area  2 
B2  5  2 Imod  area  2 
R1 10 11 5.2 
LI  7  8  7.9pH 
L2  1  0 1  JpH 
L3  3  953pH 
U490.6pH 
L5160.8pH 
L631003pH 
L7  3  20.6pH 
L8610.8pH 
L91i90.5i>H 
K1  LI  L3  0.595 

Fig.  S  JSIM  input  deck  of  the  extracted  two-junc¬ 
tion  SQUID. 


7SpH 


Fig.  6  Schematic  Diagram  of  the  extracted  two- 
junction  SQUID 

conducting  netlist  along  with  the  areas  of  Josephson  junctions, 
self-  and  mutual-inductances,  and  resistances.  The  calculatioa 
of  the  extracted  inductances  is  based  on  the  analytical  model 
developed  through  the  numerical  simulation. 

A  number  of  improvements  can  be  made  on  INDEX.  First,  ■ 
to  accurately  calculate  parasitic  inductances,  a  3D  inductance 
simulation  tool  is  needed.  Second,  a  hierarchical  extractira  can 
be  incorporated  into  INDEX  to  speed  up  the  extraction  and 
keep  the  modularity  of  a  large  hierarchical  circuit.  Last,  to  help 
the  designer  interpret  the  extracted  circuit,  an  automatic  sche¬ 
matic  generation  tool  or  a  netlist  comparator  is  also  desired. 
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MISSION 

OF 

ROME  lABORA  TORY 


Mission.  The  mission  of  Rome  Laboratory  is  to  advance  the  science  and 
technoiogies  of  command,  controi,  communications  and  inteiligence  and  to 
transition  them  into  systems  to  meet  customer  needs.  To  achieve  this, 
Rome  Lab: 


a.  Conducts  vigorous  research,  deveiopment  and  test  programs  in  all 
applicable  technologies; 

b.  Transitions  technology  to  current  and  future  systems  to  improve 
operational  capability,  readiness,  and  supportability; 

c.  Provides  a  full  range  of  technical  support  to  Air  Force  Materiel 
Command  product  centers  and  other  Air  Force  organizations; 

d.  Promotes  transfer  of  technology  to  the  private  sector; 

e.  Maintains  leading  edge  technological  expertise  in  the  areas  of 
surveillance,  communications,  command  and  controi,  intelligence,  reliability 
science,  electro-magnetic  technology,  photonics,  signal  processing,  and 
computationai  science. 


The  thrust  areas  of  technical  competence  include:  Surveillance, 
Communications,  Command  and  Control,  Intelligence,  Signal  Processing, 
Computer  Science  and  Technology,  Electromagnetic  Technology, 
Photonics  and  Reliability  Sciences. 


